The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Scrape Michigan Lakes

Jemma Stachelek

2022-12-27

library(wikilake)
## Loading required package: maps

Generate list of Michigan Lakes

Get Wikipedia URL of Category

res <- WikipediR::page_info("en", "wikipedia",
  page = "Category:Lakes of Michigan")

Scrape lake names

res <- xml2::read_html(res$query$pages[[1]]$canonicalurl)
res <- rvest::html_nodes(res, "#mw-pages .mw-category")
res <- rvest::html_nodes(res, "li")
res <- rvest::html_nodes(res, "a")
res <- rvest::html_attr(res, "title")

Remove junk names

res <- res[!(seq_len(length(res)) %in% grep("List", res))]
res <- res[!(seq_len(length(res)) %in% grep("Watershed", res))]
res <- res[!(seq_len(length(res)) %in% grep("lakes", res))]
res <- res[!(seq_len(length(res)) %in% grep("Mud Lake", res))]

Scrape tables

res <- lapply(res, lake_wiki)

# remove missing coordinates
res <- res[unlist(lapply(res, function(x) !is.na(x[, "Lat"])))]

Collapse list to data.frame

res_df_names <- unique(unlist(lapply(res, names)))
res_df <- data.frame(matrix(NA, nrow = length(res),
  ncol = length(res_df_names)))
names(res_df) <- res_df_names
for (i in seq_len(length(res))) {
  dt_pad <- data.frame(matrix(NA, nrow = 1,
    ncol = length(res_df_names) - ncol(res[[i]])))
  names(dt_pad) <- res_df_names[!(res_df_names %in% names(res[[i]]))]
  dt <- cbind(res[[i]], dt_pad)
  dt <- dt[, res_df_names]
  res_df[i, ] <- dt
}

Map lakes

library(sp)

coordinates(res_df) <- ~ Lon + Lat
map("state", region = "michigan", mar = c(0, 0, 0, 0))
points(res_df, col = "red", pch = 19)

hist(log(res_df$`Max. depth`), main = "", xlab = "Max depth (log(m))")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.