Extracting text from a web page

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Using JavaScript

One way to extract text from a page is to tell the browser to run JavaScript code that does it.

Synchronous version

library(chromote)
b <- ChromoteSession$new()

b$go_to("https://www.whatismybrowser.com/")

# Run JavaScript to extract text from the page
x <- b$Runtime$evaluate('document.querySelector(".corset .string-major a").innerText')
x$result$value
#> [1] "Chrome 75 on macOS (Mojave)"

Asynchronous version

library(chromote)
b <- ChromoteSession$new()

p <- b$Page$loadEventFired(wait_ = FALSE)
b$go_to("https://www.whatismybrowser.com/", wait_ = FALSE)
p$then(function(value) {
  b$Runtime$evaluate(
    'document.querySelector(".corset .string-major a").innerText'
  )
})$
then(function(value) {
  print(value$result$value)
})

Using Chrome DevTools Protocol commands

Another way is to use CDP commands to extract content from the DOM. This does not require executing JavaScript in the browser’s context, but it is also not as flexible as JavaScript.

Synchronous version

library(chromote)
b <- ChromoteSession$new()

b$go_to("https://www.whatismybrowser.com/")
x <- b$DOM$getDocument()
x <- b$DOM$querySelector(x$root$nodeId, ".corset .string-major a")
b$DOM$getOuterHTML(x$nodeId)
#> $outerHTML
#> [1] "<a href=\"/detect/what-version-of-chrome-do-i-have\">Chrome 75 on macOS (Mojave)</a>"

Asynchronous version

library(chromote)
b <- ChromoteSession$new()

b$go_to("https://www.whatismybrowser.com/", wait_ = FALSE)$
then(function(value) {
  b$DOM$getDocument()
})$
then(function(value) {
  b$DOM$querySelector(value$root$nodeId, ".corset .string-major a")
})$
then(function(value) {
  b$DOM$getOuterHTML(value$nodeId)
})$
then(function(value) {
  print(value)
})

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.