Example data sets:
REGICOR: These data come from 3 different cross-sectional surveys of individuals representative of the population from a north-west Spanish province (Girona), REGICOR study. It contains 2294 observations on 21 variables such as age, sex, cholesterol profile hypertension, etc.)
PREDIMED: The PREDIMED trial (Prevención con Dieta Mediterránea) is a randomized, parallel and multicentric cohort with more than 7,000 participants who were randomly assigned to three diet groups (olive oil + mediterranean diet, nuts + mediterranean diet, and low-fat diet -control group-) and followed-up during more than 7 years.data frame with 6324 observations on the following 15 variables. It contains 6324 observations on 15 variables such as age, sex, diabetes status as well as cardiovascular events during the follow-up time. For more information about the study these data come from, visit http://predimed.onmedic.net
SNPs: A data.frame containing 35 selected SNPs and other clinical covariates for 110 cases and 47 controls in a case-control study.
If you select R-format, the uploaded file should contain a single object of class data.frame.
If you select TEXT format, make sure to set the right TEXT Options so that your data can be read correctly.
It may be necessary to refresh the web page when loading a new data set.
Change the Encoding options to get sure that non-standard characters such as "ñ", "ç", etc. are read correctly.
Once your data are loaded satisfactory, this panel will be opened automatically.
A list of all variables will apear on the left side list. Use the "<>" button to move the variables from the left side to the right side list and viceversa. Once you have chosen which ones are analysed and which are discarded, press on "Update" button. The remaining variables in the left side will be the ones analysed, i.e. by rows in the descriptive tables.
Set the type of your variables to be analysed as:
Normal: mean (standard deviation),
non-normal: median [first quantile; third quartile] or as
categorical: absolute frequencies (percentages).
NA: Select this options to test whether a continuous variable is normal or not by a Shapiro test. With this option, a numeric variable with five or less unique different values is also treated as categorical.
Appropiate statistical tests to compare among groups (if specified) are computed depending on the type (normal, non-normal or categorical).
To know more about the tests performed and the functions used in each case see the published paper in JSS.
Use the "Format" tab from "Step 4. Display" panel to change how statistics are displayed (e.g. to show or not the percentages, etc.)
Select which variable indicates the group.
None: Descriptives are performed for the entire data set with no groups. Group: A variable must be selected. By default, only factors variables with five or less categories can be selected. The descriptives will be performed by the groups defined by this variable. Survival: Use this options to perform survival analysis where the response is a right censored variable, i.e., with time variable and a variable indicating the censoring status (i.e., whether or not the individual suffered the disease). Also, the disease or case status (category) must be indicated.
Choose the category you want to be hidden for categorical variables.
Additionally, in the "hide no" input text window you can type the category which represents "no" in the sense that the category named as indicated for all binary variables are hidden.
Type a selecting criteria to choose which individuals are included.
Also, you can select a subgroup of individuals for each variable. In any case, a logical expression in R language must be typed ("==" to compare, "&" to indicate "and", "|" to indicate "or" and "!" to indicate "not").
To indicate the category, you must type the number that appears in the "VALUES" table on right side of the application instead of its name. For example, for gender, type "1" instead of "male" and "2" instead of "female", if "male" is the first category and "female" is the second.
Set the Odds Ratio (OR) when response is binary or Hazard Ratios (HR) for right censored response.
You can specify the reference category when computing the OR or HR for the categorical variables. For continuous variables, the scale can be changed. This may be very useful for variables with a wide range such as total cholesterol, where it makes more sense to display the OR/HR for each 10 units of change rather than a single unit
Choose what to be displayed in the bivariate table:
ALL: Descriptives for the entire data set.
p-overall: p-value comparing the means, medians, proportions or incidence among groups.
Descriptives: Descriptives (means, medians, proportions, etc.) by groups
p-trend: p-value for trend. It is supposed that the groups are ordered.
OR/HR: Odds Ratio or Hazard Ratio for binary or time-to-event response (grouping variable), respectively.
Avalaible: Number of individuals with valid data for each variable and each groups.
NA category: For categorical variables, the non-available data is considered as a new category.
Pairwise p-value: When more than two groups are considered, p-values corresponding to 2 by 2 comparisons are performed taking into account multiple testing.
Simplify: Empty categories (with no available data) are removed from the analyses.
Specify how to display the mean and standard deviation, the quantiles and frequencies
Set the number of decimals to be displayed in the bivariate table, for descriptives, p-values and for OR / HR.
Change the "key" headers of the descriptive table such as "ALL", "p-value", etc.
Save the bivariate table in different formats: PDF, CSV, TXT, HTML, Word (.docx) or Excel (.xlsx).
If PDF format is selected, size can be specified as well as landscape format. These options may be useful when the table is big or contains lots of columns.
In this tab you can visualize some basic information about the described variables:
Name: Name of the variable in the data set.
Label: Label of the variable. If the variable has no label (e.g. data imported from a TXT or Excel file), the name of the variable is assigned to the label.
Method: Indicates how the variable is treated. By default, character or factor variables are treated as categorical, whereas numeric variables are treated as normal (displaying means and standard deviation) if they follow a normal distribution or non-normal (displaying median and quartiles) otherwise. If a numeric variable contains five or less different unique values, is also treated as categorical and therefore frequencies are displayed.
The user can change the default method by using the proper widget in the "Step 3. Setting" panel.
N: Number of valid values (i.e. non mising) for each variable.
Values: The minimum and maximum value if the variable is numeric or the different categories otherwise. By default a maximum of 10 categories are displayed. Note that a consecutive number is assigned to each category which is used when subsetting. (see info in "Subset" tab inside "Step 3. Settings" panel.
In this tab the first rows of the data set are displayed.
By making use of DataTable library, it is possible to specify how many rows are visible and navigate through the data set.
Alo it is possible to order the registries just by clicking on the arrow beside each column name.
Moeover, it is very easy to filter rows by settings the range for numerical variables or levels for categorical variables in the widgets just below each column name.
In this tab the bivariate table is shown. It can be displayed in three different formats:
HTML, PDF or as it is printed in the R console.
Note that, although R console is the less nice format, it is the fastest option to be displayed and, in consequence, may be the most useful way to see how the bivariate table looks like when some options are modified (e.g. grouping variable) "on line".
In this tab univariate or bivariate (taking into account the groups if proceeds) plots of selected variable are displayed
Click on the plot to save it in different formats: PDF, bmp, jpg, png and tiff.
In this tab descriptives of SNPs (Single Nucleotide Polymorphims) with appropriate statistics and tests for these genetic variants are displayed.
To perform this analysis, only SNPs variables can be selected.
Note that a factor response to display genotype frequencies by groups is permitted but not a time-to-event response variable.
compareGroups Project
Create data summaries for quality control, extensive reports for exploring data, as well as publication-ready univariate or bivariate tables in several formats:
plain text,
HTML,
LaTeX,
PDF,
Word or
Excel
Perform figures to quickly visualise the distribution of your data (boxplots, barplots, normality-plots, etc.).
Display statistics (mean, median, frequencies, incidences, etc.)
Perform the appropriate tests (t-test, Analysis of variance, Kruskal-Wallis, Fisher, log-rank, ...) depending on the nature of the described variable (normal, non-normal or qualitative).
Summarize genetic data (Single Nucleotids Polymorphisms) data displaying Allele Frequencies and performing Hardy-Weinberg Equilibrium tests among other typical statistics and tests for these kind of data.
compareGroups Web User Interface (WUI)
This WUI is designed to allow clinicians and scientists to take advantage of the functionality of compareGroups to analyse their epidemiological data, but without the need to install or learn how to use R.
This system has all of the functionality of the current command line version of compareGroups, but has the advantage of allowing you to load and analyse your data on any device with an internet connection.
This WUI has been build entirely using Shiny tools.
Data Security when using the compareGroups WUI
compareGroups is developed by experienced researchers who are actively involved in epidemiological research on human subjects. We understand very well how important it is to guarantee data confidentiality and security, and we have designed the compareGroups WUI so that your data will remain as secure as at your own institute.
Will my data be safe if I load them into the compareGroups WUI?
The compareGroups WUI is hosted on a secure cloud server platform, which guarantees the physical and electronic security of the WUI, and therefore of any data that are loaded into it. Data loaded into the WUI are transmitted over a secure connection via HTTPS.
What happens to my data when I load them into the compareGroups WUI?
If you're familiar with R, you'll know that it loads all data and performs all calculations within the RAM allocated for that R session. This means that it never writes data to the disk, and when the R session ends, all data it has been working on are eliminated.
Accessing the compareGroups WUI page starts a dedicated session of R to read and manipulate your data, and compute your results. However, this R session never saves data to the hard disk of the server it's running on, and only sends results to the WUI, either in the form of displayed tables or plots, or as files to be downloaded by the the user via the SAVE panel.
When you close your browser window, the R session is terminated, and all data are lost. To can verify this for yourself by simply re-loading the WUI page, after which you will see that no data are loaded. compareGroups developers only have access to the underlying R code that generates that WUI and performes the calculations, and can never gain access to data that have been loaded into a remote session.
If you have any further concerns or questions about the security of your data when using the compareGroups WUI, please get in touch with us via the contact form.