The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

rdomains: Get the category of content hosted by a domain

Install and Load the package

The latest development version of the package will always be on GitHub. To install the package from GitHub and to load the installed package:

#library(devtools)
install_github("themains/rdomains")

To install the package from CRAN, type in:

install.packages("rdomains")

Next, load the package:

library(rdomains)

Shalla

To get category of the content from Shallalist (service discontinued - using archived data), first download the archived data using:

get_shalla_data()

And then, get the category using:

shalla_cat("http://www.google.com")
##   domain_name shalla_category
## 1  google.com   searchengines

DMOZ

To get category of the content from DMOZ, first download the archived parsed CSV file using:

get_dmoz_data()

And then, get the category using:

dmoz_cat("http://www.google.com")

ML

Probability that Domain Hosts Adult Content Based on features of Domain Name and Suffix alone:

adult_ml1_cat("http://www.google.com")
##   domain_name  category
## 1  google.com 0.3133728

VirusTotal

Start by getting the API key from VirusTotal.

The package uses the VirusTotal API v3 for comprehensive domain analysis:

virustotal_cat("http://www.google.com")

OpenAI GPT Models

Get domain categorization using OpenAI’s GPT models. You’ll need an OpenAI API key:

# Set your API key
Sys.setenv("OPENAI_API_KEY", "your-api-key-here")

# Classify domains
openai_cat("google.com")
##   domain_name openai_category
## 1  google.com      technology

You can also specify custom categories:

openai_cat(c("amazon.com", "github.com"), 
           categories = c("ecommerce", "technology", "social", "other"))

Anthropic Claude

Get domain categorization using Anthropic’s Claude models. You’ll need an Anthropic API key:

# Set your API key  
Sys.setenv("ANTHROPIC_API_KEY", "your-api-key-here")

# Classify domains
claude_cat("facebook.com")
##   domain_name claude_category
## 1 facebook.com          social

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.