The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

rdomains: Get the category of content hosted by a domain

Install and Load the package

The latest development version of the package will always be on GitHub. To install the package from GitHub and to load the installed package:

#library(devtools)
install_github("themains/rdomains")

To install the package from CRAN, type in:

install.packages("rdomains")

Next, load the package:

library(rdomains)

Shalla

To get category of the content from shallalist, first download the latest file using:

get_shalla_data()

And then, get the category using:

shalla_cat("http://www.google.com")
##   domain_name shalla_category
## 1  google.com   searchengines

DMOZ

To get category of the content from DMOZ, first download the archived parsed CSV file using:

get_dmoz_data()

And then, get the category using:

dmoz_cat("http://www.google.com")

ML

Probability that Domain Hosts Adult Content Based on features of Domain Name and Suffix alone:

adult_ml1_cat("http://www.google.com")
##   domain_name  category
## 1  google.com 0.3133728

Virustotal

Start by getting the API key from virustotal.

Get virustotal category by running:

virustotal_cat("http://www.google.com")
##                 domain   bitdefender dr_web  alexa        google       websense             trendmicro
## 1 http://www.google.com searchengines  chats google searchengines advertisements search engines portals

Trusted (McAfee)

Get the content category of a domain according to McAfee (Trusted):

trusted_cat("http://www.google.com")
##                    url          status   categorization   reputation
## 2 http://www.google.com Categorized URL - Search Engines Minimal Risk

Alexa Category

To get the category of content from Amazon (Alexa) (which provides it via DMOZ), start by getting credentials from https://aws.amazon.com/. Next, set the environment variables:

Sys.setenv("AWS_ACCESS_KEY_ID", "key_id")
Sys.getenv("AWS_SECRET_ACCESS_KEY", "secret_key")

Then run,

alexa_cat(domain="http://www.google.com")[1,]
##                   Title                                           AbsolutePath
## 1 Search Engines/Google Top/Computers/Internet/Searching/Search_Engines/Google

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.