Isolation and improvement of elements involved in high expression of transgene

 Gene expression in cells is regulated in processes such as transcription, post-transcription, and translation. In order to efficiently express useful genes introduced into plants, it is necessary to optimize each process. Therefore, we are analyzing the core promoter involved in transcription, the terminator involved in transcription termination and mRNA processing, the splicing mechanism involved in mRNA diversity, the analysis of internal cleavage sites related to mRNA stability, and translation efficiency. Through these analyzes, we are isolating and improving the sequence elements involved in high expression. We also aim to provide the results obtained to multiple companies to produce vaccine proteins and growth hormones in plants (Fig. 1)

Fig.1 Flow of gene expression

 In order to optimize each step of gene expression, we analyze each regulatory process in detail and isolate and improve sequence elements involved in high expression of transgene.

Design of artificial gene

 The gene expression control differs for each gene, but this difference can be explained by the difference in sequence or sequence-dependent structure. For example, translation efficiency also differs for each gene. We have obtained translation status data of all mRNAs and sequence data of all mRNAs. By performing in silico analysis using two genome-wide data, we are constructing a machine learning model that can predict the translation efficiency from mRNA to protein. By utilizing this machine learning model, in addition to further understanding of the gene expression control mechanism, it is possible to select the 5'UTR sequence that enables high translation of the target gene. In addition, we are constructing a machine learning model using comprehensive intracellular data on transcription initiation sites /splicing patterns /poly A addition sites. Ultimately, we would like to utilize these machine learning models to design artificial genes that enable higher expression (Fig. 2)

Fig.2 Design of artificial gene

 In addition to mRNA translation status analysis and stability analysis, we will construct a machine learning model using comprehensive intracellular data on transcription initiation sites / splicing patterns / poly A addition sites. Ultimately, we will design artificial genes that can optimize the expression of useful genes.

Phenotypic and genetic polymorphisms in plants

 Nucleic and phenotypic polymorphisms are maintained even in a species. Especially for plant species, which do not move from the place to another, natural selections strongly affect to its genotype and phenotype in the process of local adaptation. Therefore, these polymorphisms are important to persistence and migration of the species. In addition to wild plants, intraspecific polymorphisms are also important in agricultural varieties. There are various polymorphisms in agricultural varieties and these polymorphisms could be keys of breeding for their specific characteristics including tastes and disease resistances.

Fig.3 Flowering time variation in a species

 Putting both of phenotypic polymorphisms and genome-wide nucleotide variation, which can be gathered by next-generation sequencing technology, into association analyses, related gene regions can be detect. We are focusing on traits related to the plant reproductive success, for example, flowering time and annual/perennial, because these traits are directly related to biodiversity and breeding.

Fig.4 Genotype-Phenotype association analysis