The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Tidy Access to Women's Tennis Association (WTA) Data
Version: 0.1.0
Description: Scrapes and tidies publicly available data from the Women's Tennis Association website (https://www.wtatennis.com). Provides helpers to retrieve player biographies, singles and doubles career overviews, match histories, live rankings and aggregate statistics. Dynamic pages are rendered through a headless 'Chrome' session so 'JavaScript'-generated content is fully captured, and all outputs are returned as tidy data frames suitable for downstream analysis or visualisation.
License: Apache License (≥ 2)
URL: https://github.com/Angnar-97/matchpointR
BugReports: https://github.com/Angnar-97/matchpointR/issues
Encoding: UTF-8
Depends: R (≥ 4.1.0)
Imports: chromote, cli, jsonlite, magick, purrr, rvest, stringr, tibble, xml2
Suggests: httr2, knitr, rmarkdown, rsvg, testthat (≥ 3.0.0), withr
Config/testthat/edition: 3
RoxygenNote: 7.3.3
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-04-26 14:34:10 UTC; User
Author: Alejandro Navas González [aut, cre] (alias: Angnar)
Maintainer: Alejandro Navas González <angnar@telaris.es>
Repository: CRAN
Date/Publication: 2026-04-28 19:40:12 UTC

matchpointR: Tidy Access to Women's Tennis Association (WTA) Data

Description

matchpointR is a small scraper toolkit that turns the public pages of https://www.wtatennis.com into tidy data frames. It ships helpers for player biographies, career highlights, full match histories and live rankings.

Details

Dynamic content is rendered through a headless Chrome session using the chromote package, so JavaScript-generated sections (matches, rankings) are fully captured before parsing. Where possible the package reads structured JSON-LD (schema.org) data instead of scraping CSS classes, for resilience against site redesigns.

Main functions

Author

Alejandro Navas González (Angnar).

Author(s)

Maintainer: Alejandro Navas González angnar@telaris.es (Angnar)

See Also

Useful links:


Fetch fully-rendered HTML with chromote

Description

Opens a headless Chrome session via chromote, waits for the page to settle, optionally clicks a "load more" button and/or scrolls, and returns the complete page source.

Usage

.chromote_get_html(
  url,
  wait = 8,
  click_more_selector = NULL,
  scroll = TRUE,
  max_clicks = 50L,
  session = NULL
)

Arguments

url

Character. Destination URL.

wait

Numeric. Seconds to wait after initial navigation. Default 8.

click_more_selector

Optional CSS selector for a "load more" button that should be clicked repeatedly until it disappears.

scroll

Logical. Scroll to the bottom after each click? Default TRUE.

max_clicks

Integer. Safety cap for the click loop. Default 50.

session

Optional pre-existing chromote::ChromoteSession. When supplied it is reused (callers are responsible for closing it).

Value

A character string containing the full page source.


Read dynamic HTML into an xml2 document

Description

Thin wrapper around .chromote_get_html() that parses the rendered HTML with xml2::read_html().

Usage

.read_html_dynamic(
  url,
  wait = 8,
  click_more_selector = NULL,
  scroll = TRUE,
  max_clicks = 50L,
  session = NULL
)

Arguments

url

Character. Destination URL.

wait

Numeric. Seconds to wait after initial navigation. Default 8.

click_more_selector

Optional CSS selector for a "load more" button that should be clicked repeatedly until it disappears.

scroll

Logical. Scroll to the bottom after each click? Default TRUE.

max_clicks

Integer. Safety cap for the click loop. Default 50.

session

Optional pre-existing chromote::ChromoteSession. When supplied it is reused (callers are responsible for closing it).

Value

An xml2::xml_document.


Get basic bio for a WTA player

Description

Parses the profile header of a WTA player page and returns a one-row tibble with name, nationality, birth date, birth place, height and handedness. The bulk of the data is read from the page's JSON-LD (schema.org Person) block, which is more stable than the visual markup; height is read from the profile bio block as a fallback.

Usage

wta_get_player_basics(player_url, download_images = TRUE)

Arguments

player_url

Character. Full URL to a player page. Build it with wta_player_url() if you only have the numeric id.

download_images

Logical. When TRUE (default) the headshot is downloaded into a magick-image object. Set to FALSE to skip the network round-trip and return only the image URL.

Value

A one-row tibble::tibble() with columns:

player_id

Numeric WTA id parsed from ⁠@id⁠.

name, given_name, family_name

Name fields.

birth_date

Date of birth (ISO 8601 character).

nationality, birth_place, birth_country

Geography fields.

height

Height string as shown on the bio (e.g. ⁠5' 9" (1.74m)⁠).

handedness

Dominant hand ("Right-Handed" / "Left-Handed").

nationality_code

3-letter IOC/ISO code extracted from the flag image (e.g. "CZE", "USA").

player_image_url, nationality_flag_url

Headshot and flag URLs.

player_image

magick-image of the headshot, when download_images = TRUE.

nationality_flag

magick-image of the flag SVG, when download_images = TRUE and the suggested package rsvg is installed (otherwise NA).

Examples


wta_get_player_basics(wta_player_url(320301, "katerina-siniakova"))


Get the match history for a WTA player

Description

Walks the dynamic "Matches" page of a player profile, clicking the "Show more" button until the full history is loaded, and returns one row per match with tournament, round, opponent, score and result.

Usage

wta_get_player_matches(player_url, max_clicks = 50L)

Arguments

player_url

Character. URL to the player page; the function normalises to the ⁠/matches⁠ path automatically.

max_clicks

Integer. Safety cap for the "Show more" click loop. Defaults to 50.

Value

A tibble::tibble() with one row per match and columns: tournament, tournament_date, round, opponent, opponent_seed, opponent_country, opponent_rank, score, result.

Examples


url <- wta_player_url(320301, "katerina-siniakova", "matches")
wta_get_player_matches(url)


Get a WTA player's career highlights

Description

Returns the structured "additional properties" block from the page's JSON-LD: current singles and doubles rank, career titles, career prize money. Supplements with the career-high singles rank read from the bio side panel.

Usage

wta_get_player_overview(player_url)

Arguments

player_url

Character. URL to the player overview page.

Value

A long-format tibble::tibble() with columns metric and value. Rows include singles_rank, doubles_rank, singles_career_titles, doubles_career_titles, career_prize_money, career_high.

Examples


wta_get_player_overview(wta_player_url(320301, "katerina-siniakova"))


Get the current WTA rankings

Description

Scrapes the rankings table at https://www.wtatennis.com/rankings/singles (or ⁠/doubles⁠) and returns a tidy tibble. The initial page renders the first 50 rows; increase the browser dwell time with wait if the widget hasn't hydrated yet.

Usage

wta_get_rankings(type = c("singles", "doubles"), top = NULL, wait = 12)

Arguments

type

Character. One of "singles", "doubles". Defaults to "singles".

top

Integer. Limit the output to the top N ranked players. NULL (default) keeps every row rendered by the page.

wait

Numeric. Seconds to wait for the rankings widget to hydrate after navigation. Defaults to 12.

Value

A tibble::tibble() with one row per player and columns: rank, player_id, player, country, age, tournaments_played, points.

Examples


wta_get_rankings("singles", top = 50)


Build a WTA player URL

Description

Convenience wrapper to assemble a canonical player URL from a numeric id and an optional slug.

Usage

wta_player_url(id, slug = NULL, section = c("overview", "matches"))

Arguments

id

Character or integer. The WTA numeric player id (e.g. 320301).

slug

Optional character. Player slug (e.g. "katerina-siniakova"). When omitted the URL still resolves — WTA redirects to the canonical one.

section

Optional character. Page section to append as a path segment, one of "overview", "matches". Defaults to "overview", which maps to the bare player URL.

Value

A single character string with the full URL.

Examples

wta_player_url(320301, "katerina-siniakova")
wta_player_url(320301, "katerina-siniakova", "matches")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.