RIdeogram: drawing SVG graphics to visualize and map genome-wide data in idiograms

Zhaodong Hao

2019-01-03

Introduction

RIdeogram is a R package to draw SVG (Scalable Vector Graphics) graphics to visualize and map genome-wide data in idiograms.

Citation

If you use this package in a published paper, please cite this paper:

Zhaodong Hao, Dekang Lv, Ying Ge, Jisen Shi, Guangchuang Yu, and Jinhui Chen (2018). RIdeogram: Drawing SVG graphics to visualize and map genome-wide data in idiograms. R package version 0.1.0.

Usage and Examples

This is a simple package with only two functions ideogram and convertSVG.

First, you need to load the package after you installed it.

require(RIdeogram)
#> Loading required package: RIdeogram

Then, you need to load the data from the RIdeogram package.

data(human_karyotype, package="RIdeogram")
data(gene_density, package="RIdeogram")
data(Random_RNAs_500, package="RIdeogram")

You can use the function “head()” to see the data format.

head(human_karyotype)
#>   Chr Start       End  CE_start    CE_end
#> 1   1     0 248956422 122026459 124932724
#> 2   2     0 242193529  92188145  94090557
#> 3   3     0 198295559  90772458  93655574
#> 4   4     0 190214555  49712061  51743951
#> 5   5     0 181538259  46485900  50059807
#> 6   6     0 170805979  58553888  59829934

Specifically, the ‘karyotype’ file contains the karyotype information and has five columns (or three, see below). The first column is Chromosome ID, the second and thrid columns are start and end positions of corresponding chromosomes and the fourth and fifth columns are start and end positions of corresponding centromeres.

head(gene_density)
#>   Chr   Start     End Value
#> 1   1       1 1000000    65
#> 2   1 1000000 2000000    76
#> 3   1 2000000 3000000    35
#> 4   1 3000000 4000000    30
#> 5   1 4000000 5000000    10
#> 6   1 5000000 6000000    10

The ‘mydata’ file contains the heatmap information and has four columns. The first column is Chromosome ID, the second and thrid columns are start and end positions of windows in corresponding chromosomes and the fourth column is a characteristic value in corresponding windows, such as gene number.

head(Random_RNAs_500)
#>    Type    Shape Chr    Start      End  color
#> 1  tRNA   circle   6 69204486 69204568 6a3d9a
#> 2  rRNA      box   3 68882967 68883091 33a02c
#> 3  rRNA      box   5 55777469 55777587 33a02c
#> 4  rRNA      box  21 25202207 25202315 33a02c
#> 5 miRNA triangle   1 86357632 86357687 ff7f00
#> 6 miRNA triangle  11 74399237 74399333 ff7f00

The ‘mydata_interval’ file contains the label information and has six columns. The first column is the label type, the second column is the shape of label with three available options of box, triangle and circle, the third column is Chromosome ID, the fourth and fifth columns are the start and end positions of corresponding labels in the chromosomes and the sixth column is the color of the label.

Or, you can also load your own data by using the function “read.table”, such as

human_karyotype <- read.table("karyotype.txt", sep = "\t", header = T, stringsAsFactors = F)
gene_density <- read.table("data_1.txt", sep = "\t", header = T, stringsAsFactors = F)
Random_RNAs_500 <- read.table("data_2.txt", sep = "\t", header = T, stringsAsFactors = F)

The “karyotype.txt” file contains karyotype information; the “data_1.txt” file contains heatmap data; the “data_2.txt” contains track label data.

These three files are all you need, now you can visualize these information using the ideogram function.

Basic usage

ideogram(karyotype, overlaid = NULL, label = NULL, colorset1, colorset2, width, Lx, Ly, output = "chromosome.svg")
convertSVG(svg, device, width, height, dpi)

Now, let’s begin.

First, we draw a idiogram with no mapping data.

ideogram(karyotype = human_karyotype)
convertSVG("chromosome.svg", device = "png")

Then, you will find a SVG file and a PNG file in your Working Directory.

Next, we can map genome-wide data on the chromosome idiogram. In this case, we visulize the gene density across the human genome.

ideogram(karyotype = human_karyotype, overlaid = gene_density)
convertSVG("chromosome.svg", device = "png")

Alternatively, we can map some genome-wide data with track labels next to the chromosome idiograms.

ideogram(karyotype = human_karyotype, label = Random_RNAs_500)
convertSVG("chromosome.svg", device = "png")

We can also map the overlaid heatmap and track labels on the chromosome idiograms at the same time.

ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500)
convertSVG("chromosome.svg", device = "png")

If you want to change the color of heatmap, you can modify the argument ‘colorset’ (default set is colorset = c(“#4575b4”, “#ffffbf”, “#d73027”)). You can use either color names as listed by colors() or hexadecimal strings of the form “#rrggbb” or “#rrggbbaa”.

ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, colorset = c("#fc8d59", "#ffffbf", "#91bfdb"))
convertSVG("chromosome.svg", device = "png")

If you don not know the centromere information in your species, you don not need to modify the script. In this case, the ‘karyotype’ file has only three columns.

To simulate this case, we deleted the last two columns of the ‘human_karyotype’ file.

human_karyotype <- human_karyotype[,1:3]
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500)
convertSVG("chromosome.svg", device = "png")

If there are only ten chromosomes in your species, maybe you need to motify the argument ‘width’ (default value is “170”).

To simulate this case, we only keep the first ten columns of the ‘human_karyotype’ file.

Before

human_karyotype <- human_karyotype[1:10,]
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500)
convertSVG("chromosome.svg", device = "png")

After

human_karyotype <- human_karyotype[1:10,]
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, width = 100)
convertSVG("chromosome.svg", device = "png")

If you want to move the Legend, then you need to modify the arguments ‘Lx’ and ‘Ly’(default values are “160” and “35”, separately).

‘Lx’ means the distance between upper-left point of the Legend and the leaf margin; ‘Ly’ means the distance between upper-left point of the Legend and the upper margin.

ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, width = 100, Lx = 80, Ly = 25)
convertSVG("chromosome.svg", device = "png")

If you have two sets of heatmap data, such as gene density and LTR density, you can use the following scripts to map and visualize these data in idiograms.

data(human_karyotype, package="RIdeogram") #reload the karyotype data
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = LTR_density, colorset1 = c("#f7f7f7", "#e34a33"), colorset2 = c("#f7f7f7", "#2c7fb8")) #use the arguments 'colorset1' and 'colorset2' to set the colors for gene and LTR heatmaps, separately.
convertSVG("chromosome.svg", device = "png")

In addition, you can use the argument “device” (default value is “png”)to set the format of output file, such as, “tiff”, “pdf”, “jpg”, etc. And, you can use the argument “dpi” (default value is “300”) to set the resolution of the output image file.

convertSVG("chromosome.svg", device = "tiff", dpi = 600)

Also, there are four shortcuts to convert the SVG images to these optional image formats with no necessary to set the argument “device”, such as

svg2tiff("chromosome.svg")
svg2pdf("chromosome.svg")
svg2jpg("chromosome.svg")
svg2png("chromosome.svg")