Biogeographical and Genomic Analysis of Eleusine Species
Abstract
Abstract Eleusine Gaertn. (Poaceae, subfamily Chloridoideae), is a small taxon of closely related and distinct diploids and tetraploids endemic to Africa that have been scrutinized from vegetative, floral, cytological, and molecular evidence with a sustained interest in their phylogeny and adaptations, partly due to the economic and ecological impacts of a super crop (E. coracana) and a weed species (E. indica) in the genus. Studies to elucidate the genotypic and phenotypic relationships in E. coracana have always involved Single Nucleotide Polymorphisms (SNPs), although recent studies show that SNPs do not capture large genomic variations that equally contribute to phenotypic differences. In this thesis, I used environmental data to characterize the eco-geographical distribution of the different Eleusine species in Africa and investigated structural variations in E. coracana. Using Maximum Extent modeling software (Maxent), I characterized possible environmental predictors for the presence of Eleusine species in Africa based on collection records on Global Biodiversity Information Facility (GBIF) and 33 bioclimatic and soil data. Furthermore, I analyzed publicly available, paired-end, whole-genome E. coracana sequences from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) repository for structural variants and their genomic distribution with custom bash and R scripts created with freely available bioinformatics tools. Maxent modeling revealed a high degree of variation in the probability of Eleusine species on the African continent and indicated possible suitable environments in new locations. There is a need to corroborate these environmental distribution findings with known locality records (e.g., herbarium records) and field verifications. Whole genome sequence (WGS) analysis revealed a high occurrence of Structural Variants (SV) in Eleusine coracana with 455 inversions, 18,990 duplications, and 103,338 deletions variants detected. This high incidence of deletion and duplication events are consistent with SV analyses in other plants, especially polyploids. In addition, substantiating identified genomic variations in E. coracana in combined multiple approaches involving other high-performance SV callers would be helpful for more robust prediction and reduce error calls. Hopefully, identified variants lay the groundwork for future analyses identifying structural genomic variations. These approaches in this research together present the first data uncovering environmental preferences and genomic variation influences in Eleusine and can help our understanding of the genus.