Published: 2022-08-26
Journal: SCIENCE ADVANCES
PETAR PAJIC, SHICHEN SHEN, JUN QU, ALISON J. MAY, SARAH KNOX, STEFAN RUHL, OMER GOKCUMEN
Abstract
How novel gene functions evolve is a fundamental question in biology. Mucin proteins, a functionally but not evolutionarily defined group of proteins, allow the study of convergent evolution of gene function. By analyzing the genomic variation of mucins across a wide range of mammalian genomes, we propose that exonic repeats and their copy number variation contribute substantially to the de novo evolution of new gene functions. By integrating bioinformatic, phylogenetic, proteomic, and immunohistochemical approaches, we identified 15 undescribed instances of evolutionary convergence, where novel mucins originated by gaining densely O-glycosylated exonic repeat domains. Our results suggest that secreted proteins rich in proline are natural precursors for acquiring mucin function. Our findings have broad implications for understanding the role of exonic repeats in the parallel evolution of new gene functions, especially those involving protein glycosylation.
INTRODUCTION
Parallel independent evolution resulting in similar genetic variants has been discussed as a common driver of convergent response to adaptive pressures (1). This line of inquiry is exciting because instances of parallel evolution provide a natural framework to study the relative contributions of selection and mutational constraints to genomic variation. Recent studies provided evidence that parallel evolution is widespread in all branches of life (2). A considerable number of reported cases of parallel evolution involve recurrent structural variants, originating through convergent expansions of gene families as a response to similar adaptive pressures. Examples include the recurrent duplications of amylase genes among animals consuming starch-rich diets (3), recurrent mutations in innate immune system proteins (4), species-specific gene duplications involved in caffeine synthesis in coffee and tea plants (5), and venom evolution through gene duplications in reptiles (6) and mammals (7).
Recent studies have implied that mucin genes, which are grouped on the basis of their function rather than evolutionary commonality, may have been particularly prone to convergent evolution (8, 9). Mucins are a group of functionally characterized glycoproteins, defined by the presence of repeated proline (P)-, threonine (T)-, and serine (S)-rich O-linked glycosylation sites (10) known as PTS repeats. Functionally, mucins play crucial roles in mediating signaling between epithelial cells, in forming mucous layers to lubricate various organs, and in providing a protective barrier against environmental insult (11). In addition, mucins form an interface with commensal and pathogenic microbes, thus contributing to both colonization by a physiological microflora and host defense against pathogens (12). In a disease-related context, mucins have been shown to play roles in the pathology of cystic fibrosis (13) and other lung diseases (14) as well as in various malignancies (15). Despite the widespread and growing interest in the functional and biomedical aspects of mucin proteins (16), the evolution of mucin genes is not well understood.
Most genes with similar functions originate from duplication of a shared ancestral gene (17). They are identical by descent. However, mucin genes in the human genome do not all share common ancestry. Instead, most genes with well-described mucin function in humans belong to two gene families: secreted gel-forming mucins and membrane-bound mucins that likely evolved independently (8). Other mucins (MUC7, MUC22, and MUC16), not belonging to these two major families, were named “orphans” by Dekker and coworkers (8) because they represent no apparent orthology to other genes, including other mucins. The presence of two evolutionarily distinct mucin gene families, as well as the existence of scattered orphan mucins in the human genome, suggests that recurrent, lineage-specific evolution of mucin function may be a widespread evolutionary phenomenon in this functionally homologous, but genetically heterogeneous, group of genes. Thus, mucins provide an excellent model to study the independent evolution of specific gene functions for shedding light on the functional potential of nonconserved sequences. By studying the evolution of mucin genes in mammals, this study puts forward an evolutionary model for generation of new gene functions, especially pertaining to glycosylation.