Its amazing how one problem leads to another in research. I was asked by my advisor to design modular mimics of heparan sulfate (HS), a glycosaminoglycan that is involved in binding several different proteins in our body. The end goal was to be able to mimic a given sequence of a HS molecule with a non-sugar molecule resulting in a cost-effective and easily synthesizable therapeutic agent. In trying to achieve this goal, I came across a more fundamental problem.
The “Glycosaminoglycan Problem”
HS, in principle, is made of 48 different disaccharides although only 23 have been identified to date. Even with 23 disaccharides, 23 х 23 х 23 = 12,167 different hexasacharide structures are possible, which is highly significant in the biological realm of ligand-mediated modulation of protein function. Variability in these 12,167 molecules results from different sulfate group arrangements, different uronic acid structures, different uronic acid conformers, and different glycosidic bond torsions. What is not known is 1) how many of these structures are important in vivo?, 2) How diverse are they really, i.e. do small changes in structure (e.g. deletion of a single sulfate group) cause significant differences in protein binding?
From a medicinal chemistry perspective, it would be a great challenge to target a given glycosaminoglycan-binding protein by engineering a specific HS sequence. Such a sequence need not be physiologically relevant but would certainly be therapeutically valuable. Thus, in an effort to design modular mimics of HS, I found myself digging deep into the glycosaminoglycan literature trying to understand and define the concept of glycosaminoglycan-sequence specificity!