Case study

Terpene composition is a phenotypic trait that affects consumer preference as they are responsible for the myriads of flavors and aroma (Booth and Bohlmann, 2019). Variations of cannabis terpene profiles are contributed by the variations of the cannabis terpene synthase (CsTPS) gene family as well as the differential expression of CsTPS between varieties (Booth et al., 2020). In this part, we demonstrate the use of CannabisGDB by constructing a phylogenetic tree of CsTPS among the CsPK, CsFN, CsJLD and CsCBD as well as heat map comparing the transcript levels of CsTPSs identified from CsJLD in nine different cannabis varieties (Black Lime, Terple, Valley Fire, Cherry Chem, Canna Tsu, Black Berry Kush, Mama Thai, White Cookies and Sour Diesel, Zager et al.).

The CsTPS family has been split into three classes: Class I consists of TPS-c, TPS-e/f and TPS-h (Selaginella specific); class II includes TPS-d (Gymnosperm specific) and class III is composed of TPS-a, TPS-b and TPS-g (Chen et al, 2011). The CsTPS family was shown to have 22 records with two Pfam annotations, PF1397 Terpene_synth and PF03936 Terpene_synth_C, in Uniprot ( Users could search CsTPS genes using these two Pfam annotations in CannabisGDB. In ‘Search Page’, use ‘PF01397 PF03936’ as keywords in Pfam ID search field (Figure 1). After searching, users can filter the data with PF01397 and PF03936 (Figure 2). There are 217 genes left after sorted by variety, including 26 in CsCBD, 32 in CsFN, 58 in CsJLD and 48 in CsPK (Figure 2).

Figure 1. Use 'PF01397 PF03936' as keyword in Pfam ID search field.

Figure 2. Filter the data with PF01397 and PF03936.

Users could obtain the protein sequences that those 217 genes encode through 'Gene Page' (Figure 3). The phylogenetic tree was generated by MEGAX program (Kumar et al., 2018) using Maximum Likelihood method with 1000 bootstrap based on the protein sequences of the chromosome-anchored assemblies (CsCBD, CsFN, CsPK and CsJLD) together with Arabidopsis thaliana TPSs defined as outgroups (AtTPS1-AtTPS32). Our results indicate that CsTPS can be classified into two classes with 5 clades (Figure 4). Additionally, 6/7 of the CsTPS are the members of Class III, and none of Class II was identified from these 4 cannabis varieties. Among the 5 clades, the TPS-b clade is the largest (69 proteins), while the TPS-c clade is the smallest (5 proteins). Interestingly, all the members in the TPS-c clade were found on the sex chromosomes.

Figure 3. Obtain the protein sequences through 'Gene Page'.


Figure 4. Phylogenetic analysis of CsTPS. The phylogenetic tree was generated by MEGAX program (Kumar et al., 2018)

Using heatmap tool in CannabisGDB, users can create an interactive heatmap through a suite of available parameter settings located in the sidebar panel. An example of constructing the heatmap for the 58 CsTPS genes identified from CsJLD using the expression data from the study of Zager et al., 2019 is shown (Figure 5, 6). After opening the heatmap tool, users select the reference genome (CsJLD), fill in the gene ID (the default gene list containing the 58 CsTPS genes), and clicks 'GO' to generate heatmap. (Figure 5 step1-4). Users can personalize the analysis by creating a new heat map (Figure 5 step5-6), clustering the samples (Figure 5 step5, 7), selecting genes from the same TPS clade according to the results of phylogenetic analysis, then right-click to select 'Annotate Selection' to add annotation information (Figure 5 step7). Through the heat map, we found that all TPS-e/f clade genes were not expressed or expressed at low level in nine cannabis varieties, and all TPS-g clade genes were highly expressed in nine varieties. (Figure 6)

Figure 5. The step-by-step instructions on how to use heatmap tool provided by CannabisGDB.

Figure 6. Gene expression level of CsTPS presented by heatmap generated by the heatmap tool in CannabisGDB.



Booth JK, Bohlmann J (2019) Terpenes in Cannabis sativa - From plant genome to humans. Plant Sci 284:67–72 .

Booth JK, Yuen MMS, Jancsik S, Madilao LL, Page JE, Bohlmann J (2020) Terpene Synthases and Terpene Variation in Cannabis sativa. Plant Physiology 184:130–147 .

Chen F, Tholl D, Bohlmann J, Pichersky E (2011) The family of terpene synthases in plants: a mid-size family of genes for specialized metabolism that is highly diversified throughout the kingdom. Plant J 66:212–229 .

Kumar S, Stecher G, Li M, Knyaz C, Tamura K (2018) MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol Biol Evol 35:1547–1549 .

Zager JJ, Lange I, Srividya N, Smith A, Lange BM (2019) Gene Networks Underlying Cannabinoid and Terpenoid Accumulation in Cannabis. Plant Physiology 180:1877–1897 .

Get in Touch

Please Cite

Cai, S., Zhang, Z., Huang, S., Bai, X., Huang, Z., Zhang, Y. J., Huang, L., Tang, W., Haughn, G., You, S.and Liu, Y. (2021) CannabisGDB: a comprehensive genomic database for Cannabis Sativa L. Plant Biotechnol J,