FAQ

PlantCRE is a deep learning-based web server which has learned the cis-regulatory code of plant genes from four plant species of Arabidopsis, Zea mays, Oryza sativa and Solanum lycopersicum. As an application, it can identify plant cis-regulatory elements (CREs) of given gene(s), as well as locate the exact position of each CRE.
Citation: Yang Qiu#, Lifen Liu#, Jiali Yan#, Xianglei Xiang, Shouzhe Wang, Yun Luo, Kaixuan Deng, Jieting Xu, Minliang Jin, Xiaoyu Wu, Liwei Cheng, Ying Zhou, Weibo Xie, Hai-Jun Liu, Alisdair R. Fernie, Xuehai Hu* and Jianbing Yan*. Precise engineering of gene expression by editing plasticity.

FAQ

Frequently Asked Questions (and Answers)

Q: What is the constructions of basenji2 models?

A: Basenji2 is a deep learning model specifically designed for genomic sequence analysis. Basenji2 is primarily used for interpreting noncoding genetic variants and predicting their impact on gene expression levels. Here, we modified its structure to create a prediction model specifically for plant species, and to advance plant gene expression prediction development. The model includes seven ConvBlocks, several dilated residual blocks, a convolution layer and a fully connected layer with 1 node. The number of the blocks and pooling size were designed to reach a 192-bp bin size which can cover two nucleosome core particles. The dilated residual block was developed to spread information across the sequences and model long-range interactions. For each species, its number of dilated residual blocks is specific. For specific structural details, please refer to https://github.com/liulifenyf/PlantCRE.

Q: what's the input of basenji2 models?

A: For each gene, we first obtain a segment of genomic sequence around its transcription start site (TSS), and then perform one-hot encoding on it. The encoded sequence serves as the input to the model. The length of the genomic sequence involved varies for each species. For specific details, please refer to the model introduction of each model.

Q: What's the output of basenji2 models?

A: FOr Zea mays, we used the maximal TPM across multipel RNA-seq experments from different tissues as teh target gene expression level. For other three species, we used medium TPM across multiple RNA-seq experiments as gene expression level. All RNA-seq experiments used to get outputs can be found at https://github.com/liulifenyf/PlantCRE.

Q: How to get contribution score for a gene?

A: We use the interpretability algorithm gradient × input to calculate contribution scores. The gradient × input method is one of the gradient-based methods, and it estimates contribution scores using the back-propagation procedure through the network. Specifically, given a one-hot encoded input sequence (A = [1,0,0,0], C = [0,1,0,0], G = [0,0,1,0], T = [0,0,0,1], N = [0,0,0,0]), we first calculated the gradient vector and then employed an element-wise product between the gradient vector and the input. Subsequently, we took an average of contribution scores on four types of bases. Finally, we obtained a contribution score for each base with the same length as the input sequence.

Q: How to identify candidate CREs based on the contribution score?

A: To identify CREs for each gene, we developed a peak-calling algorithm based on base contribution scores. The detail code used to call peak can be found at https://github.com/liulifenyf/PlantCRE.

Q: How can i obtain all candidate CREs identified by PlantCRE for a specific species?

A: You can download the total 70W+ candidate CREs for all 46272 maize genes at https://github.com/liulifenyf/PlantCRE.

PlantCRE

Plant cis-regulatory element identification system