The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
immunogenetr is a comprehensive toolkit for clinical HLA informatics, built on tidyverse principles. It uses the genotype list string (GL string, https://glstring.org/) as its core data structure for storing and computing HLA genotype data.
This vignette walks through the main workflows:
Clinical HLA data is typically stored in a tabular format, with each
allele in its own column. immunogenetr includes the
HLA_typing_1 dataset as an example:
# HLA_typing_1 contains typing for 10 individuals across all classical HLA loci.
head(HLA_typing_1, 3)
#> # A tibble: 3 × 19
#> patient A1 A2 C1 C2 B1 B2 DRB345_1 DRB345_2 DRB1_1 DRB1_2
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 A*24:02 A*29:… C*07… C*16… B*44… B*44… DRB5*01… DRB5*01… DRB1*… DRB1*…
#> 2 2 A*02:01 A*11:… C*07… C*07… B*07… B*08… DRB3*01… DRB4*01… DRB1*… DRB1*…
#> 3 3 A*02:01 A*26:… C*02… C*03… B*27… B*54… DRB3*02… DRB4*01… DRB1*… DRB1*…
#> # ℹ 8 more variables: DQA1_1 <chr>, DQA1_2 <chr>, DQB1_1 <chr>, DQB1_2 <chr>,
#> # DPA1_1 <chr>, DPA1_2 <chr>, DPB1_1 <chr>, DPB1_2 <chr>The HLA_columns_to_GLstring() function converts these
columns into a single GL string per individual. When used inside
mutate(), pass . as the first argument to
reference the working data frame:
HLA_typing_GL <- HLA_typing_1 %>%
# Convert all typing columns (A1 through DPB1_2) into a GL string.
mutate(
GL_string = HLA_columns_to_GLstring(., HLA_typing_columns = A1:DPB1_2),
.after = patient
) %>%
# Keep only patient ID and the new GL string column.
select(patient, GL_string)
# View the GL strings.
(HLA_typing_GL)
#> # A tibble: 10 × 2
#> patient GL_string
#> <int> <chr>
#> 1 1 HLA-A*24:02+HLA-A*29:02^HLA-C*07:04+HLA-C*16:01^HLA-B*44:02+HLA-B*44…
#> 2 2 HLA-A*02:01+HLA-A*11:05^HLA-C*07:01+HLA-C*07:02^HLA-B*07:02+HLA-B*08…
#> 3 3 HLA-A*02:01+HLA-A*26:18^HLA-C*02:02+HLA-C*03:04^HLA-B*27:05+HLA-B*54…
#> 4 4 HLA-A*29:02+HLA-A*30:02^HLA-C*06:02+HLA-C*07:01^HLA-B*08:01+HLA-B*13…
#> 5 5 HLA-A*02:05+HLA-A*24:02^HLA-C*07:18+HLA-C*12:03^HLA-B*35:03+HLA-B*58…
#> 6 6 HLA-A*01:01+HLA-A*24:02^HLA-C*07:01+HLA-C*14:02^HLA-B*49:01+HLA-B*51…
#> 7 7 HLA-A*03:01+HLA-A*03:01^HLA-C*03:03+HLA-C*16:01^HLA-B*15:01+HLA-B*51…
#> 8 8 HLA-A*01:01+HLA-A*32:01^HLA-C*06:02+HLA-C*07:02^HLA-B*08:01+HLA-B*37…
#> 9 9 HLA-A*03:01+HLA-A*30:01^HLA-C*07:02+HLA-C*12:03^HLA-B*07:02+HLA-B*38…
#> 10 10 HLA-A*02:05+HLA-A*11:01^HLA-C*07:18+HLA-C*16:02^HLA-B*51:01+HLA-B*58…Each GL string encodes the full genotype: alleles within a gene copy
are separated by / (ambiguity), gene copies by
+, and loci by ^.
To go the other direction, GLstring_genes() splits a GL
string back into separate columns by locus:
# Take the first patient's GL string and split it into locus columns.
# Note: GLstring_genes and GLstring_genes_expanded use pivot_longer on all
# columns, so only pass the GL string column (no other data types).
single_patient <- HLA_typing_GL[1, "GL_string", drop = FALSE]
GLstring_genes(single_patient, "GL_string")
#> # A tibble: 1 × 9
#> HLA_A HLA_C HLA_B HLA_DRB5 HLA_DRB1 HLA_DQA1 HLA_DQB1 HLA_DPA1 HLA_DPB1
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 HLA-A*24:02… HLA-… HLA-… HLA-DRB… HLA-DRB… HLA-DQA… HLA-DQB… HLA-DPA… HLA-DPB…For a fully expanded view with one allele per row, use
GLstring_genes_expanded():
GLstring_genes_expanded(single_patient, "GL_string")
#> # A tibble: 2 × 9
#> A C B DRB5 DRB1 DQA1 DQB1 DPA1 DPB1
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 HLA-A*24:02 HLA-C*07:04 HLA-B*44:02 HLA-DRB5*01… HLA-… HLA-… HLA-… HLA-… HLA-…
#> 2 HLA-A*29:02 HLA-C*16:01 HLA-B*44:03 HLA-DRB5*01… HLA-… HLA-… HLA-… HLA-… HLA-…The mismatch functions are the core of immunogenetr. They all take a recipient GL string, a donor GL string, one or more loci, and a direction.
Let’s set up a recipient/donor pair:
# Patient 7 is the recipient, patient 9 is the donor.
recip_gl <- HLA_typing_GL %>% filter(patient == 7) %>% pull(GL_string)
donor_gl <- HLA_typing_GL %>% filter(patient == 9) %>% pull(GL_string)HLA_mismatch_logical)HLA_mismatch_number)HLA_mismatched_alleles)The HLA_match_summary_HCT() function provides standard
match grades used in hematopoietic cell transplantation:
# X-of-8 matching (A, B, C, DRB1 bidirectional).
HLA_match_summary_HCT(recip_gl, donor_gl,
direction = "bidirectional",
match_grade = "Xof8"
)
#> [1] 1
# X-of-10 matching (adds DQB1).
HLA_match_summary_HCT(recip_gl, donor_gl,
direction = "bidirectional",
match_grade = "Xof10"
)
#> [1] 1A common workflow is comparing one recipient against multiple potential donors:
# Patient 3 is the recipient; compare against all 10 donors.
recipient <- HLA_typing_GL %>%
filter(patient == 3) %>%
select(GL_string) %>%
rename(GL_string_recip = GL_string)
donors <- HLA_typing_GL %>%
rename(GL_string_donor = GL_string, donor = patient) %>%
# Cross-join to pair recipient with each donor.
cross_join(recipient) %>%
# Calculate 8/8 match grade for each pair.
mutate(
match_8of8 = HLA_match_summary_HCT(
GL_string_recip, GL_string_donor,
direction = "bidirectional",
match_grade = "Xof8"
),
.after = donor
) %>%
# Sort best matches first.
arrange(desc(match_8of8))
donors %>% select(donor, match_8of8)
#> # A tibble: 10 × 2
#> donor match_8of8
#> <int> <int>
#> 1 3 8
#> 2 2 1
#> 3 5 1
#> 4 1 0
#> 5 4 0
#> 6 6 0
#> 7 7 0
#> 8 8 0
#> 9 9 0
#> 10 10 0HLA_truncate() reduces allele resolution to a specified
number of fields:
# Truncate a four-field allele to two fields.
HLA_truncate("HLA-A*02:01:01:01", fields = 2)
#> [1] "HLA-A*02:01"
# Works on full GL strings too.
HLA_truncate("HLA-A*02:01:01:01+HLA-A*03:01:01:02^HLA-B*07:02:01:01+HLA-B*44:02:01:01",
fields = 2
)
#> [1] "HLA-A*02:01+HLA-A*03:01^HLA-B*07:02+HLA-B*44:02"HLA_prefix_remove() and HLA_prefix_add()
manage the HLA- and locus prefixes:
# Remove all prefixes to get just the allele fields.
HLA_prefix_remove("HLA-A*02:01")
#> [1] "02:01"
# Keep the locus designation but remove "HLA-".
HLA_prefix_remove("HLA-A*02:01", keep_locus = TRUE)
#> [1] "A*02:01"
# Add the full prefix back.
HLA_prefix_add("02:01", "HLA-A*")
#> [1] "HLA-A*02:01"
# "HLA-" is added by default.
HLA_prefix_add("A*02:01")
#> [1] "HLA-A*02:01"GLstring_regex() creates regex patterns that accurately
search within GL strings, preventing partial matches across field
boundaries:
gl <- "HLA-A*02:01:01+HLA-A*68:01^HLA-B*07:01+HLA-B*15:01"
# A two-field search correctly matches the three-field allele.
pattern <- GLstring_regex("HLA-A*02:01")
stringr::str_detect(gl, pattern)
#> [1] TRUE
# But won't falsely match a longer allele number.
stringr::str_detect("HLA-A*02:149:01", GLstring_regex("HLA-A*02:14"))
#> [1] FALSEWhen working in the tidyverse, column names with dashes and asterisks
are inconvenient. HLA_column_repair() converts between
WHO-standard (HLA-A*) and tidyverse-friendly
(HLA_A) formats:
# GLstring_genes returns tidyverse-friendly names by default.
repaired <- GLstring_genes(single_patient, "GL_string")
names(repaired)
#> [1] "HLA_A" "HLA_C" "HLA_B" "HLA_DRB5" "HLA_DRB1" "HLA_DQA1" "HLA_DQB1"
#> [8] "HLA_DPA1" "HLA_DPB1"
# Convert back to WHO format with asterisks.
who_names <- HLA_column_repair(repaired, format = "WHO", asterisk = TRUE)
names(who_names)
#> [1] "HLA-A*" "HLA-C*" "HLA-B*" "HLA-DRB5*" "HLA-DRB1*" "HLA-DQA1*"
#> [7] "HLA-DQB1*" "HLA-DPA1*" "HLA-DPB1*"The read_HML() function extracts GL strings from HML
(HLA Markup Language) files, which are a standard format for reporting
HLA typing results from next-generation sequencing:
# immunogenetr ships with two example HML files.
hml_path <- system.file("extdata", "HML_1.hml", package = "immunogenetr")
hml_result <- read_HML(hml_path)
hml_result
#> # A tibble: 5 × 2
#> sampleID GL_string
#> <chr> <chr>
#> 1 22-03848-HLA-031722-AB-AlloSeq-EP HLA-A*33:03:01:01+HLA-A*34:02:01:01^HLA-B*1…
#> 2 22-03849-HLA-031722-AB-AlloSeq-EP HLA-A*23:01:01:01+HLA-A*30:01:01:01^HLA-B*5…
#> 3 22-03850-HLA-031722-AB-AlloSeq-EP HLA-A*02:01:01:01+HLA-A*02:01:01:01^HLA-B*3…
#> 4 22-03851-HLA-031722-AB-AlloSeq-EP HLA-A*02:01:01:01+HLA-A*23:01:01:01^HLA-B*4…
#> 5 22-03852-HLA-031722-AB-AlloSeq-EP HLA-A*33:01:01:01+HLA-A*33:03:01:01/HLA-A*3…This library is intended for research use. Any application making use of this package in a clinical setting will need to be independently validated according to local regulations.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.