--- title: "Demos for canprot" output: html_vignette: mathjax: null vignette: > %\VignetteIndexEntry{Demos for canprot} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: canprot.bib link-citations: yes csl: elementa.csl --- ```{r setup, include=FALSE} library(canprot) library(CHNOSZ) oldopt <- options(width = 80) ``` ```{r HTML, include=FALSE} Zc <- "ZC" ``` The **canprot** package calculates chemical metrics of proteins from amino acid compositions. This vignette was compiled on `r Sys.Date()` with **canprot** version `r packageDescription("canprot")$Version`. Next vignette: [Introduction to canprot](introduction.html) ## canprot demo #1: Thermophiles Run the demo using `demo("thermophiles")`. For this demo, just the output is shown below. The **canprot** functions used are: - `calc_metrics()`: Calculates metrics named in an argument. The metrics calculated here are: - `S0g`: standard specific entropy - `Zc`: carbon oxidation state - `cplab`: This is not a function, but an object that has formatted text labels for each metric. - `add_hull()`: Adds a convex hull around data points ```{r echo = FALSE} knitr::read_chunk("../demo/thermophiles.R") ``` ```{r thermophiles_demo_body, out.width = "100%", fig.align = "center", fig.width = 8, fig.height = 6, echo = FALSE, message = FALSE, dpi = 150} ``` The data are from @DBCS23 for methanogen genomes (amino acid composition and optimal growth temperature) and from @LLX+24 for *Nitrososphaeria* MAGs (genome assemblies and habitat and respiration types). The plots reveal that proteins tend to have higher specific entropy in thermophilic genomes and MAGs from thermal habitats compared to mesophilic genomes and MAGs from nonthermal habitats, for a given carbon oxidation state. This implies that, after correcting for *Z*C, proteins in thermophiles have a more negative derivative of the standard Gibbs energy per gram of protein with respect to temperature. ## canprot demo #2: Subcellular locations Run the demo using `demo("locations")`. The code and output of the demo are shown below. The **canprot** functions used are: - `human_aa()`: Gets amino acid compositions of human proteins from UniProt IDs - `plength()`: Calculates protein length (this line is commented out) - `Zc()`: Calculates carbon oxidation state - `pI()`: Calculates isoelectric point - `add_cld()`: Adds compact letter display to a boxplot ```{r echo = FALSE} knitr::read_chunk("../demo/locations.R") ``` ```{r locations_demo_setup, echo = FALSE} ``` ```{r locations_demo_body, out.width = "100%", fig.align = "center", fig.width = 8, fig.height = 4.5, dpi = 150} ``` The plots show carbon oxidation state (`r Zc`) and isoelectric point (pI) for human proteins in different subcellular locations. The localization data is from Table S6 of @TAW+17, filtered to include proteins that have both a validated location and only one predicted location. ## canprot demo #3: Redoxins Run the demo using `demo("redoxins")`. For this demo, just the output is shown below. The **canprot** functions used are: - `read_fasta()`: Reads a FASTA sequence file and returns amino acid compositions of proteins. Additional processing is performed by using the following arguments: - `type` to read header lines - `iseq` to read specific sequences - `start` and `stop` to read segments of the sequences - `Zc()`: Calculates carbon oxidation state ```{r echo = FALSE} knitr::read_chunk("../demo/redoxins.R") ``` ```{r redoxins_demo_body, out.width = "75%", fig.align = "center", fig.width = 6, fig.height = 5, echo = FALSE, message = FALSE, dpi = 100} ``` This is an *exploratory analysis* for hypothesis generation about evolutionary links between midpoint reduction potential and `r Zc` of proteins. The reduction potential data was taken from @ABH97 and @HSJ+99 for *E. coli* and spinach proteins, respectively. This plot is modified from Fig. 5 of [this preprint](https://doi.org/10.1101/004127); the figure did not appear in the published paper. ```{r reset, include=FALSE} options(oldopt) ``` ## References