The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Welcome to the world of glycan text parsing! If you’ve ever worked with glycan data from different sources, you know the frustration: every database, software tool, and research group seems to have their own way of representing glycan structures in text format.
That’s where glyparse comes to the rescue! 🚀
Think of glyparse as your universal glycan
translator — it can read glycan structures written in many
different “languages” and convert them all into a unified format that
your computer can understand and work with.
Note: All functions in glyparse return
glyrepr::glycan_structure objects. If you are unfamiliar
with glyrepr, you can read the documentation here.
Before we dive in, let’s see what we’re dealing with. Here’s the same N-glycan core structure written in different formats:
| Format | Example | Where You’ll See It |
|---|---|---|
| IUPAC-condensed | Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc |
Literature, UniCarbKB |
| IUPAC-short | Mana3(Mana6)Manb4GlcNAcb4GlcNAc |
Literature, UniCarbKB |
| IUPAC-extended | alpha-D-Man-(1->3)-[alpha-D-Man-(1->6)]-beta-D-Man-(1->4)-beta-D-GlcNAc-(1->4)-D-GlcNAc |
Literature, UniCarbKB |
| GlycoCT | Complex multi-line format | Literature, GlycomeDB |
| WURCS | WURCS=2.0/3,5,4/[...]/1-1-2-3-3/a4-b1_b4-c1... |
Literature, GlyTouCan |
| Linear Code | Ma3(Ma6)Mb4GNb4GNb |
Literature |
| pGlyco | (N(N(H(H(H))))) |
pGlyco software results |
| StrucGP | A2B2C1D1E2fedcba |
StrucGP software results |
Confusing, right? 😵💫 glyparse understands them all!
glyparse provides seven specialized parsers, each
optimized for a specific format:
parse_iupac_condensed(): The most
common formatparse_iupac_short(): Compact
literature formatparse_iupac_extended(): Verbose formal
formatparse_glycoct(): Database standard
formatparse_wurcs(): Modern standardized
formatparse_linear_code(): Linear Code
formatparse_pglyco_struc(): pGlyco software
formatparse_strucgp_struc(): StrucGP
software formatAll parsers follow the same pattern:
glyrepr::glycan_structure
object that you can analyzeauto_parse()Don’t know what you’re dealing with? Give it to
auto_parse()! This function tries to identify the format
automatically and use the appropriate parser. Even input with mixed
formats is supported.
x <- c(
"Gal(b1-3)GalNAc(b1-",
"(N(F)(N(H(H(N))(H(N(H))))))",
"WURCS=2.0/3,3,2/[a2122h-1b_1-5][a1122h-1b_1-5][a1122h-1a_1-5]/1-2-3/a4-b1_b3-c1"
)
auto_parse(x)
#> <glycan_structure[3]>
#> [1] Gal(b1-3)GalNAc(b1-
#> [2] Hex(??-?)HexNAc(??-?)Hex(??-?)[HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)[dHex(??-?)]HexNAc(??-
#> [3] Man(a1-3)Man(b1-4)Glc(b1-
#> # Unique structures: 3Let’s start with the IUPAC formats.
This format is widely used in scientific literature and databases like UniCarbKB.
Want to know more about IUPAC-condensed format? Check this out!
# Single structure
iupac_condensed <- "Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-4)Gal(b1-4)Glc(a1-"
parse_iupac_condensed(iupac_condensed)
#> <glycan_structure[1]>
#> [1] Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-4)Gal(b1-4)Glc(a1-
#> # Unique structures: 1# Multiple structures at once
glycans <- c(
"Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", # N-glycan core
"Gal(b1-3)GalNAc(b1-", # O-glycan core 1
"Neu5Ac(a2-3)Gal(b1-3)[GlcNAc(b1-6)]GalNAc(b1-" # O-glycan core 2
)
parse_iupac_condensed(glycans)
#> <glycan_structure[3]>
#> [1] Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-
#> [2] Gal(b1-3)GalNAc(b1-
#> [3] Neu5Ac(a2-3)Gal(b1-3)[GlcNAc(b1-6)]GalNAc(b1-
#> # Unique structures: 3This compact format is popular in research papers because it saves space:
# The same structures in short format
iupac_short <- c(
"Mana3(Mana6)Manb4GlcNAcb4GlcNAcb-",
"Galb3GalNAcb-",
"Neu5Aca3Galb3(GlcNAcb6)GalNAcb-"
)
parse_iupac_short(iupac_short)
#> <glycan_structure[3]>
#> [1] Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-
#> [2] Gal(b1-3)GalNAc(b1-
#> [3] Neu5Ac(a2-3)Gal(b1-3)[GlcNAc(b1-6)]GalNAc(b1-
#> # Unique structures: 3Notice how much more compact this is! The parser is smart enough to infer common linkage positions (like Neu5Ac always being a2-linked).
This verbose format includes full chemical names and stereochemistry:
GlycoCT is used in literature for precise representation and in databases like GlycomeDB. It’s more complex but extremely precise:
WURCS (Web3 Unique Representation of Carbohydrate Structures) is used in literature for complex structures and in databases like GlyTouCan:
If you work with glycoproteomics, you might encounter pGlyco’s parenthetical notation:
pglyco <- "(N(F)(N(H(H(N))(H(N(H))))))"
parse_pglyco_struc(pglyco)
#> <glycan_structure[1]>
#> [1] Hex(??-?)HexNAc(??-?)Hex(??-?)[HexNAc(??-?)Hex(??-?)]Hex(??-?)HexNAc(??-?)[dHex(??-?)]HexNAc(??-
#> # Unique structures: 1This cryptic notation actually represents a complex N-glycan:
StrucGP uses a letter-based encoding system:
glyparse transforms the chaos of glycan text formats
into order. No matter where your glycan data comes from, databases,
literature, or software tools, you can now parse it into
glyrepr::glycan_structure() for further analysis. In fact,
glyread package uses these parsing functions internally
when reading output from common glycopeptide identification
softwares.
Next steps:
glyrepr package for structure
manipulationglymotif for motif analysis of your parsed
structuresglyexp for experimental data analysisglycoverse ecosystem!Happy parsing! 🧬✨
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.