library(googlesheets)
suppressMessages(library(dplyr))
The gs_ls()
function returns the sheets you would see in your Google Sheets home screen: https://docs.google.com/spreadsheets/. This should include sheets that you own and may also show sheets owned by others but that you are permitted to access, if you visited the sheet in the browser. Expect a prompt to authenticate yourself in the browser at this point (more below re: authentication).
(my_sheets <- gs_ls())
# (expect a prompt to authenticate with Google interactively HERE)
my_sheets %>% glimpse()
Don't worry if you don't have any suitable Google Sheets lying around! We've published a sheet for you to practice with and have built functions into googlesheets
to help you access it. The example sheet holds some of the Gapminder data; feel free to visit the Sheet in the browser. The code below will put a copy of this sheet into your Drive, titled “Gapminder”.
gs_gap() %>%
gs_copy(to = "Gapminder")
If that seems to have worked, go check for a sheet named “Gapminder” in your Google Sheets home screen: https://docs.google.com/spreadsheets/. You could also run gs_ls()
again and make sure the Gapminder sheet is listed.
If you plan to consume data from a sheet or edit it, you must first register it. This is how googlesheets
records important info about the sheet that is required downstream by the Google Sheets or Google Drive APIs. Once registered, you can print the result to get some basic info about the sheet.
googlesheets
provides several registration functions. Specifying the sheet by title? Use gs_title()
. By key? Use gs_key()
. You get the idea.
We're using the built-in functions gs_gap_key()
and gs_gap_url()
to produce the key and browser URL for the Gapminder example sheet, so you can see how this will play out with your own projects.
gap <- gs_title("Gapminder")
gap
# Need to access a sheet you do not own?
# Access it by key if you know it!
(GAP_KEY <- gs_gap_key())
third_party_gap <- GAP_KEY %>%
gs_key()
# Need to access a sheet you do not own but you have a sharing link?
# Access it by URL!
(GAP_URL <- gs_gap_url())
third_party_gap <- GAP_URL %>%
gs_url()
# note: registration via URL may not work for "old" sheets
# Worried that a spreadsheet's registration is out-of-date?
# Re-register it!
gap <- gap %>% gs_gs()
The registration functions gs_title()
, gs_key()
, gs_url()
, and gs_gs()
return a registered sheet as a googlesheet
object, which is the first argument to practically every function in this package. Likewise, almost every function returns a freshly registered googlesheet
object, ready to be stored or piped into the next command.
We export a utility function, extract_key_from_url()
, to help you get and store the key from a browser URL. Registering via browser URL is fine, but registering by key is probably a better idea in the long-run.
If you want to consume the data in a worksheet and get something rectangular back, use the all-purpose function gs_read()
. By default, it reads all the data in a worksheet.
oceania <- gap %>% gs_read(ws = "Oceania")
#> Accessing worksheet titled "Oceania"
oceania
#> Source: local data frame [24 x 6]
#>
#> country continent year lifeExp pop gdpPercap
#> 1 Australia Oceania 1952 69.12 8691212 10039.60
#> 2 Australia Oceania 1957 70.33 9712569 10949.65
#> 3 Australia Oceania 1962 70.93 10794968 12217.23
#> 4 Australia Oceania 1967 71.10 11872264 14526.12
#> 5 Australia Oceania 1972 71.93 13177000 16788.63
#> 6 Australia Oceania 1977 73.49 14074100 18334.20
#> 7 Australia Oceania 1982 74.74 15184200 19477.01
#> 8 Australia Oceania 1987 76.32 16257249 21888.89
#> 9 Australia Oceania 1992 77.56 17481977 23424.77
#> 10 Australia Oceania 1997 78.83 18565243 26997.94
#> .. ... ... ... ... ... ...
str(oceania)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 24 obs. of 6 variables:
#> $ country : chr "Australia" "Australia" "Australia" "Australia" ...
#> $ continent: chr "Oceania" "Oceania" "Oceania" "Oceania" ...
#> $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
#> $ lifeExp : num 69.1 70.3 70.9 71.1 71.9 ...
#> $ pop : int 8691212 9712569 10794968 11872264 13177000 14074100 15184200 16257249 17481977 18565243 ...
#> $ gdpPercap: num 10040 10950 12217 14526 16789 ...
glimpse(oceania)
#> Observations: 24
#> Variables:
#> $ country (chr) "Australia", "Australia", "Australia", "Australia", ...
#> $ continent (chr) "Oceania", "Oceania", "Oceania", "Oceania", "Oceania...
#> $ year (int) 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992...
#> $ lifeExp (dbl) 69.120, 70.330, 70.930, 71.100, 71.930, 73.490, 74.7...
#> $ pop (int) 8691212, 9712569, 10794968, 11872264, 13177000, 1407...
#> $ gdpPercap (dbl) 10039.60, 10949.65, 12217.23, 14526.12, 16788.63, 18...
You can target specific cells via the range =
argument. The simplest usage is to specify an Excel-like cell range, such as range = “D12:F15” or range = “R1C12:R6C15”. The cell rectangle can be specified in various other ways, using helper functions.
gap %>% gs_read(ws = 2, range = "A1:D8")
#> Accessing worksheet titled "Americas"
#> Source: local data frame [7 x 4]
#>
#> country continent year lifeExp
#> 1 Argentina Americas 1952 62.485
#> 2 Argentina Americas 1957 64.399
#> 3 Argentina Americas 1962 65.142
#> 4 Argentina Americas 1967 65.634
#> 5 Argentina Americas 1972 67.065
#> 6 Argentina Americas 1977 68.481
#> 7 Argentina Americas 1982 69.942
gap %>% gs_read(ws = "Europe", range = cell_rows(1:4))
#> Accessing worksheet titled "Europe"
#> Source: local data frame [3 x 6]
#>
#> country continent year lifeExp pop gdpPercap
#> 1 Albania Europe 1952 55.23 1282697 1601.056
#> 2 Albania Europe 1957 59.28 1476505 1942.284
#> 3 Albania Europe 1962 64.82 1728137 2312.889
gap %>% gs_read(ws = "Europe", range = cell_rows(100:103), col_names = FALSE)
#> Accessing worksheet titled "Europe"
#> Source: local data frame [4 x 6]
#>
#> X1 X2 X3 X4 X5 X6
#> 1 Finland Europe 1962 68.75 4491443 9371.843
#> 2 Finland Europe 1967 69.83 4605744 10921.636
#> 3 Finland Europe 1972 70.87 4639657 14358.876
#> 4 Finland Europe 1977 72.52 4738902 15605.423
gap %>% gs_read(ws = "Africa", range = cell_cols(1:4))
#> Accessing worksheet titled "Africa"
#> Source: local data frame [624 x 4]
#>
#> country continent year lifeExp
#> 1 Algeria Africa 1952 43.077
#> 2 Algeria Africa 1957 45.685
#> 3 Algeria Africa 1962 48.303
#> 4 Algeria Africa 1967 51.407
#> 5 Algeria Africa 1972 54.518
#> 6 Algeria Africa 1977 58.014
#> 7 Algeria Africa 1982 61.368
#> 8 Algeria Africa 1987 65.799
#> 9 Algeria Africa 1992 67.744
#> 10 Algeria Africa 1997 69.152
#> .. ... ... ... ...
gap %>% gs_read(ws = "Asia", range = cell_limits(c(1, 4), c(5, NA)))
#> Accessing worksheet titled "Asia"
#> Source: local data frame [4 x 3]
#>
#> lifeExp pop gdpPercap
#> 1 28.801 8425333 779.4453
#> 2 30.332 9240934 820.8530
#> 3 31.997 10267083 853.1007
#> 4 34.020 11537966 836.1971
gs_read()
is a wrapper that bundles together the most common methods to read data from the API and transform it for downstream use. You can refine it's behavior further, by passing more arguments via ...
. Read the help file for more details.
If gs_read()
doesn't do what you need, then keep reading for the underlying functions to read and post-process data.
There are three ways to consume data from a worksheet within a Google spreadsheet. The order goes from fastest-but-more-limited to slowest-but-most-flexible:
gs_read_csv()
: Don't let the name scare you! Nothing is written to file during this process. The name just reflects that, under the hood, we request the data via the “exportcsv” link. For cases where gs_read_csv()
and gs_read_listfeed()
both work, we see that gs_read_csv()
is around 50 times faster. Use this when your data occupies a nice rectangle in the sheet and you're willing to consume all of it. You will get a tbl_df
back, which is basically just a data.frame
. In fact, you might want to use gs_read_csv()
, it in other, less tidy scenarios and do further munging in R.gs_read_listfeed()
: Gets data via the “list feed”, which consumes data row-by-row. Like gs_read_csv()
, this is appropriate when your data occupies a nice rectangle. You will again get a tbl_df
back, but your variable names may have been mangled (by Google, not us!). Specifically, variable names will be forcefully lowercased and all non-alpha-numeric characters will be removed. Why do we even have this function? The list feed supports some query parameters for sorting and filtering the data, which we plan to support (#17).gs_read_cellfeed()
: Get data via the “cell feed”, which consumes data cell-by-cell. This is appropriate when you want to consume arbitrary cells, rows, columns, and regions of the sheet. It is invoked by gs_read()
whenever the range =
argument is used. It works great for modest amounts of data but can be rather slow otherwise. gs_read_cellfeed()
returns a tbl_df
with one row per cell. You can target specific cells via the range
argument. See below for demos of gs_reshape_cellfeed()
and gs_simplify_cellfeed()
which help with post-processing.# Get the data for worksheet "Oceania": the super-fast csv way
oceania_csv <- gap %>% gs_read_csv(ws = "Oceania")
#> Accessing worksheet titled "Oceania"
str(oceania_csv)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 24 obs. of 6 variables:
#> $ country : chr "Australia" "Australia" "Australia" "Australia" ...
#> $ continent: chr "Oceania" "Oceania" "Oceania" "Oceania" ...
#> $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
#> $ lifeExp : num 69.1 70.3 70.9 71.1 71.9 ...
#> $ pop : int 8691212 9712569 10794968 11872264 13177000 14074100 15184200 16257249 17481977 18565243 ...
#> $ gdpPercap: num 10040 10950 12217 14526 16789 ...
oceania_csv
#> Source: local data frame [24 x 6]
#>
#> country continent year lifeExp pop gdpPercap
#> 1 Australia Oceania 1952 69.12 8691212 10039.60
#> 2 Australia Oceania 1957 70.33 9712569 10949.65
#> 3 Australia Oceania 1962 70.93 10794968 12217.23
#> 4 Australia Oceania 1967 71.10 11872264 14526.12
#> 5 Australia Oceania 1972 71.93 13177000 16788.63
#> 6 Australia Oceania 1977 73.49 14074100 18334.20
#> 7 Australia Oceania 1982 74.74 15184200 19477.01
#> 8 Australia Oceania 1987 76.32 16257249 21888.89
#> 9 Australia Oceania 1992 77.56 17481977 23424.77
#> 10 Australia Oceania 1997 78.83 18565243 26997.94
#> .. ... ... ... ... ... ...
# Get the data for worksheet "Oceania": the less-fast tabular way ("list feed")
oceania_list_feed <- gap %>% gs_read_listfeed(ws = "Oceania")
#> Accessing worksheet titled "Oceania"
str(oceania_list_feed)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 24 obs. of 6 variables:
#> $ country : chr "Australia" "Australia" "Australia" "Australia" ...
#> $ continent: chr "Oceania" "Oceania" "Oceania" "Oceania" ...
#> $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
#> $ lifeexp : num 69.1 70.3 70.9 71.1 71.9 ...
#> $ pop : int 8691212 9712569 10794968 11872264 13177000 14074100 15184200 16257249 17481977 18565243 ...
#> $ gdppercap: num 10040 10950 12217 14526 16789 ...
oceania_list_feed
#> Source: local data frame [24 x 6]
#>
#> country continent year lifeexp pop gdppercap
#> 1 Australia Oceania 1952 69.12 8691212 10039.60
#> 2 Australia Oceania 1957 70.33 9712569 10949.65
#> 3 Australia Oceania 1962 70.93 10794968 12217.23
#> 4 Australia Oceania 1967 71.10 11872264 14526.12
#> 5 Australia Oceania 1972 71.93 13177000 16788.63
#> 6 Australia Oceania 1977 73.49 14074100 18334.20
#> 7 Australia Oceania 1982 74.74 15184200 19477.01
#> 8 Australia Oceania 1987 76.32 16257249 21888.89
#> 9 Australia Oceania 1992 77.56 17481977 23424.77
#> 10 Australia Oceania 1997 78.83 18565243 26997.94
#> .. ... ... ... ... ... ...
# Get the data for worksheet "Oceania": the slow cell-by-cell way ("cell feed")
oceania_cell_feed <- gap %>% gs_read_cellfeed(ws = "Oceania")
#> Accessing worksheet titled "Oceania"
str(oceania_cell_feed)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 150 obs. of 5 variables:
#> $ cell : chr "A1" "B1" "C1" "D1" ...
#> $ cell_alt : chr "R1C1" "R1C2" "R1C3" "R1C4" ...
#> $ row : int 1 1 1 1 1 1 2 2 2 2 ...
#> $ col : int 1 2 3 4 5 6 1 2 3 4 ...
#> $ cell_text: chr "country" "continent" "year" "lifeExp" ...
#> - attr(*, "ws_title")= chr "Oceania"
oceania_cell_feed
#> Source: local data frame [150 x 5]
#>
#> cell cell_alt row col cell_text
#> 1 A1 R1C1 1 1 country
#> 2 B1 R1C2 1 2 continent
#> 3 C1 R1C3 1 3 year
#> 4 D1 R1C4 1 4 lifeExp
#> 5 E1 R1C5 1 5 pop
#> 6 F1 R1C6 1 6 gdpPercap
#> 7 A2 R2C1 2 1 Australia
#> 8 B2 R2C2 2 2 Oceania
#> 9 C2 R2C3 2 3 1952
#> 10 D2 R2C4 2 4 69.12
#> .. ... ... ... ... ...
Let's consume all the data for Africa by all 3 methods and see how long it takes.
jfun <- function(readfun)
system.time(do.call(readfun, list(gs_gap(), ws = "Africa", verbose = FALSE)))
readfuns <- c("gs_read_csv", "gs_read_listfeed", "gs_read_cellfeed")
readfuns <- sapply(readfuns, get, USE.NAMES = TRUE)
sapply(readfuns, jfun)
#> gs_read_csv gs_read_listfeed gs_read_cellfeed
#> user.self 0.030 0.141 1.057
#> sys.self 0.001 0.017 0.040
#> elapsed 1.737 1.761 3.104
#> user.child 0.000 0.000 0.000
#> sys.child 0.000 0.000 0.000
If you consume data from the cell feed with gs_read_cellfeed(..., range = ...)
, you get a data.frame back with one row per cell. The package offers two functions to post-process this into something more useful, gs_reshape_cellfeed()
and gs_simplify_cellfeed()
.
To reshape into a table, use gs_reshape_cellfeed()
. You can signal that the first row contains column names (or not) with col_names = TRUE
(or FALSE
). Or you can provide a character vector of names. This is inspired by the col_names
argument of readxl::read_excel()
and readr::read_delim()
, which generalizes the header
argument of read.table()
.
# Reshape: instead of one row per cell, make a nice rectangular data.frame
australia_cell_feed <- gap %>%
gs_read_cellfeed(ws = "Oceania", range = "A1:F13")
#> Accessing worksheet titled "Oceania"
str(australia_cell_feed)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 78 obs. of 5 variables:
#> $ cell : chr "A1" "B1" "C1" "D1" ...
#> $ cell_alt : chr "R1C1" "R1C2" "R1C3" "R1C4" ...
#> $ row : int 1 1 1 1 1 1 2 2 2 2 ...
#> $ col : int 1 2 3 4 5 6 1 2 3 4 ...
#> $ cell_text: chr "country" "continent" "year" "lifeExp" ...
#> - attr(*, "ws_title")= chr "Oceania"
oceania_cell_feed
#> Source: local data frame [150 x 5]
#>
#> cell cell_alt row col cell_text
#> 1 A1 R1C1 1 1 country
#> 2 B1 R1C2 1 2 continent
#> 3 C1 R1C3 1 3 year
#> 4 D1 R1C4 1 4 lifeExp
#> 5 E1 R1C5 1 5 pop
#> 6 F1 R1C6 1 6 gdpPercap
#> 7 A2 R2C1 2 1 Australia
#> 8 B2 R2C2 2 2 Oceania
#> 9 C2 R2C3 2 3 1952
#> 10 D2 R2C4 2 4 69.12
#> .. ... ... ... ... ...
australia_reshaped <- australia_cell_feed %>% gs_reshape_cellfeed()
str(australia_reshaped)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 12 obs. of 6 variables:
#> $ country : chr "Australia" "Australia" "Australia" "Australia" ...
#> $ continent: chr "Oceania" "Oceania" "Oceania" "Oceania" ...
#> $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
#> $ lifeExp : num 69.1 70.3 70.9 71.1 71.9 ...
#> $ pop : int 8691212 9712569 10794968 11872264 13177000 14074100 15184200 16257249 17481977 18565243 ...
#> $ gdpPercap: num 10040 10950 12217 14526 16789 ...
australia_reshaped
#> Source: local data frame [12 x 6]
#>
#> country continent year lifeExp pop gdpPercap
#> 1 Australia Oceania 1952 69.120 8691212 10039.60
#> 2 Australia Oceania 1957 70.330 9712569 10949.65
#> 3 Australia Oceania 1962 70.930 10794968 12217.23
#> 4 Australia Oceania 1967 71.100 11872264 14526.12
#> 5 Australia Oceania 1972 71.930 13177000 16788.63
#> 6 Australia Oceania 1977 73.490 14074100 18334.20
#> 7 Australia Oceania 1982 74.740 15184200 19477.01
#> 8 Australia Oceania 1987 76.320 16257249 21888.89
#> 9 Australia Oceania 1992 77.560 17481977 23424.77
#> 10 Australia Oceania 1997 78.830 18565243 26997.94
#> 11 Australia Oceania 2002 80.370 19546792 30687.75
#> 12 Australia Oceania 2007 81.235 20434176 34435.37
# Example: first 3 rows
gap_3rows <- gap %>% gs_read_cellfeed("Europe", range = cell_rows(1:3))
#> Accessing worksheet titled "Europe"
gap_3rows %>% head()
#> Source: local data frame [6 x 5]
#>
#> cell cell_alt row col cell_text
#> 1 A1 R1C1 1 1 country
#> 2 B1 R1C2 1 2 continent
#> 3 C1 R1C3 1 3 year
#> 4 D1 R1C4 1 4 lifeExp
#> 5 E1 R1C5 1 5 pop
#> 6 F1 R1C6 1 6 gdpPercap
# convert to a data.frame (by default, column names found in first row)
gap_3rows %>% gs_reshape_cellfeed()
#> Source: local data frame [2 x 6]
#>
#> country continent year lifeExp pop gdpPercap
#> 1 Albania Europe 1952 55.23 1282697 1601.056
#> 2 Albania Europe 1957 59.28 1476505 1942.284
# arbitrary cell range, column names no longer available in first row
gap %>%
gs_read_cellfeed("Oceania", range = "D12:F15") %>%
gs_reshape_cellfeed(col_names = FALSE)
#> Accessing worksheet titled "Oceania"
#> Source: local data frame [4 x 3]
#>
#> X4 X5 X6
#> 1 80.370 19546792 30687.75
#> 2 81.235 20434176 34435.37
#> 3 69.390 1994794 10556.58
#> 4 70.260 2229407 12247.40
# arbitrary cell range, direct specification of column names
gap %>%
gs_read_cellfeed("Oceania", range = cell_limits(c(2, 1), c(5, 3))) %>%
gs_reshape_cellfeed(col_names = paste("thing", c("one", "two", "three"),
sep = "_"))
#> Accessing worksheet titled "Oceania"
#> Source: local data frame [4 x 3]
#>
#> thing_one thing_two thing_three
#> 1 Australia Oceania 1952
#> 2 Australia Oceania 1957
#> 3 Australia Oceania 1962
#> 4 Australia Oceania 1967
To extract the cell data into an atomic vector, possibly named, use gs_simplify_cellfeed()
. You can signal that the first row contains column names (or not) with col_names = TRUE
(or FALSE
). There are several arguments to control conversion.
# Example: first row only
gap_1row <- gap %>% gs_read_cellfeed("Europe", range = cell_rows(1))
#> Accessing worksheet titled "Europe"
gap_1row
#> Source: local data frame [6 x 5]
#>
#> cell cell_alt row col cell_text
#> 1 A1 R1C1 1 1 country
#> 2 B1 R1C2 1 2 continent
#> 3 C1 R1C3 1 3 year
#> 4 D1 R1C4 1 4 lifeExp
#> 5 E1 R1C5 1 5 pop
#> 6 F1 R1C6 1 6 gdpPercap
# convert to a named character vector
gap_1row %>% gs_simplify_cellfeed()
#> A1 B1 C1 D1 E1 F1
#> "country" "continent" "year" "lifeExp" "pop" "gdpPercap"
# Example: single column
gap_1col <- gap %>% gs_read_cellfeed("Europe", range = cell_cols(3))
#> Accessing worksheet titled "Europe"
gap_1col
#> Source: local data frame [361 x 5]
#>
#> cell cell_alt row col cell_text
#> 1 C1 R1C3 1 3 year
#> 2 C2 R2C3 2 3 1952
#> 3 C3 R3C3 3 3 1957
#> 4 C4 R4C3 4 3 1962
#> 5 C5 R5C3 5 3 1967
#> 6 C6 R6C3 6 3 1972
#> 7 C7 R7C3 7 3 1977
#> 8 C8 R8C3 8 3 1982
#> 9 C9 R9C3 9 3 1987
#> 10 C10 R10C3 10 3 1992
#> .. ... ... ... ... ...
# convert to a un-named character vector and drop the variable name
gap_1col %>% gs_simplify_cellfeed(notation = "none", col_names = TRUE)
#> [1] "year" "1952" "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992"
#> [11] "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982"
#> [21] "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972"
#> [31] "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962"
#> [41] "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952"
#> [51] "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002"
#> [61] "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992"
#> [71] "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982"
#> [81] "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972"
#> [91] "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962"
#> [101] "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952"
#> [111] "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002"
#> [121] "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992"
#> [131] "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982"
#> [141] "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972"
#> [151] "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962"
#> [161] "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952"
#> [171] "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002"
#> [181] "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992"
#> [191] "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982"
#> [201] "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972"
#> [211] "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962"
#> [221] "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952"
#> [231] "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002"
#> [241] "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992"
#> [251] "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982"
#> [261] "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972"
#> [271] "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962"
#> [281] "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952"
#> [291] "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002"
#> [301] "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992"
#> [311] "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972" "1977" "1982"
#> [321] "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962" "1967" "1972"
#> [331] "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952" "1957" "1962"
#> [341] "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002" "2007" "1952"
#> [351] "1957" "1962" "1967" "1972" "1977" "1982" "1987" "1992" "1997" "2002"
#> [361] "2007"
You can use googlesheets
to create new spreadsheets.
foo <- gs_new("foo")
foo
By default, there will be an empty worksheet called “Sheet1”, but you can control it's title, extent, and initial data with additional arguments to gs_new()
(see gs_edit_cells()
in the next section). You can also add, rename, and delete worksheets within an existing sheet via gs_ws_new()
, gs_ws_rename()
, and gs_ws_delete()
. Copy an entire spreadsheet with gs_copy()
.
There are two ways to edit cells within an existing worksheet of an existing spreadsheet:
gs_edit_cells()
can write into an arbitrary cell rectanglegs_add_row()
can add a new row to the bottom of an existing cell rectangleIf you have the choice, gs_add_row()
is faster, but it can only be used when your data occupies a very neat rectangle in the upper left corner of the sheet. It relies on the list feed. gs_edit_cells()
relies on batch editing on the cell feed.
We'll work within the completely empty sheet created above, foo
. If your edit populates the sheet with everything it should have, set trim = TRUE
and we will resize the sheet to match the data. Then the nominal worksheet extent is much more informative (vs. the default of 1000 rows and 26 columns) and any future consumption via the cell feed will be much faster.
## foo <- gs_new("foo")
## initialize the worksheets
foo <- foo %>% gs_ws_new("edit_cells")
foo <- foo %>% gs_ws_new("add_row")
## add first six rows of iris data (and var names) into a blank sheet
foo <- foo %>%
gs_edit_cells(ws = "edit_cells", input = head(iris), trim = TRUE)
## initialize sheet with column headers and one row of data
## the list feed is picky about this
foo <- foo %>%
gs_edit_cells(ws = "add_row", input = head(iris, 1), trim = TRUE)
## add the next 5 rows of data ... careful not to go too fast
for(i in 2:6) {
foo <- foo %>% gs_add_row(ws = "add_row", input = iris[i, ])
Sys.sleep(0.3)
}
## let's inspect out work
foo %>% gs_read(ws = "edit_cells")
foo %>% gs_read(ws = "add_row")
Go to your Google Sheets home screen, find the new sheet foo
and look at it. You should see some iris data in the worksheets named edit_cells
and add_row
.
Note how we always store the returned value from gs_edit_cells()
(and all other sheet editing functions). That's because the registration info changes whenever we edit the sheet and we re-register it inside these functions, so this idiom will help you make sequential edits and queries to the same sheet.
Read the function documentation for gs_edit_cells()
for how to specify where the data goes, via an anchor cell, and in which direction, via the shape of the input or the byrow =
argument.
Let's clean up by deleting the foo
spreadsheet we've been playing with.
gs_delete(foo)
If you'd rather specify sheets for deletion by title, look at gs_grepdel()
and gs_vecdel()
. These functions also allow the deletion of multiple sheets at once.
Here's how we can create a new spreadsheet from a suitable local file. First, we'll write then upload a comma-delimited excerpt from the iris data.
iris %>%
head(5) %>%
write.csv("iris.csv", row.names = FALSE)
iris_ss <- gs_upload("iris.csv")
iris_ss
iris_ss %>% gs_read()
file.remove("iris.csv")
Now we'll upload a multi-sheet Excel workbook. Slowly.
gap_xlsx <- gs_upload(system.file("mini-gap.xlsx", package = "googlesheets"))
gap_xlsx
gap_xlsx %>% gs_read(ws = "Asia")
And we clean up after ourselves on Google Drive.
gs_vecdel(c("iris", "mini-gap"))
## achieves same as:
## gs_delete(iris_ss)
## gs_delete(gap_xlsx)
You can download a Google Sheet as a csv, pdf, or xlsx file. Downloading the spreadsheet as a csv file will export the first worksheet (default) unless another worksheet is specified.
gs_title("Gapminder") %>%
gs_download(ws = "Africa", to = "gapminder-africa.csv")
## is it there? yes!
read.csv("gapminder-africa.csv") %>% head()
Download the entire spreadsheet as an Excel workbook.
gs_title("Gapminder") %>%
gs_download(to = "gapminder.xlsx")
Go check it out in Excel, if you wish!
And now we clean up the downloaded files.
file.remove(c("gapminder.xlsx", "gapminder-africa.csv"))
If you use a function that requires authentication, it will be auto-triggered. But you can also initiate the process explicitly if you wish, like so:
# Give googlesheets permission to access your spreadsheets and google drive
gs_auth()
Use gs_auth(new_user = TRUE)
, to force the process to begin anew. Otherwise, the credentials left behind will be used to refresh your access token as needed.
The function gs_user()
will print and return some information about the current authenticated user and session.
user_session_info <- gs_user()
#> No authorization yet in this session!
#> NOTE: a .httr-oauth file exists in current working directory.
#> Run gs_auth() to use the credentials cached in .httr-oauth for this session.
#> No user currently authorized.
user_session_info
#> NULL
In March 2014 Google introduced “new” Sheets. “New” Sheets and “old” sheets behave quite differently with respect to access via API and present a big headache for us. Recently, we've noted that Google is forcibly converting sheets: all “old” Sheets will be switched over the “new” sheets during 2015. However there are still “old” sheets lying around, so we've made some effort to support them, when it's easy to do so. But keep your expectations low.
In particular, gs_read_csv()
does not currently work for “old” sheets.