Demos for canprot

The canprot package calculates chemical metrics of proteins from amino acid compositions. This vignette was compiled on 2025-02-07 with canprot version 2.0.0-2.

Next vignette: Introduction to canprot

canprot demo #1: Thermophiles

Run the demo using demo("thermophiles"). For this demo, just the output is shown below.

The canprot functions used are:

calc_metrics(): Calculates metrics named in an argument. The metrics calculated here are:
- S0g: standard specific entropy
- Zc: carbon oxidation state
cplab: This is not a function, but an object that has formatted text labels for each metric.
add_hull(): Adds a convex hull around data points

The data are from Dick et al. (2023) for methanogen genomes (amino acid composition and optimal growth temperature) and from Luo et al. (2024) for Nitrososphaeria MAGs (genome assemblies and habitat and respiration types). The plots reveal that proteins tend to have higher specific entropy in thermophilic genomes and MAGs from thermal habitats compared to mesophilic genomes and MAGs from nonthermal habitats, for a given carbon oxidation state. This implies that, after correcting for Z_C, proteins in thermophiles have a more negative derivative of the standard Gibbs energy per gram of protein with respect to temperature.

canprot demo #2: Subcellular locations

Run the demo using demo("locations"). The code and output of the demo are shown below.

The canprot functions used are:

human_aa(): Gets amino acid compositions of human proteins from UniProt IDs
plength(): Calculates protein length (this line is commented out)
Zc(): Calculates carbon oxidation state
pI(): Calculates isoelectric point
add_cld(): Adds compact letter display to a boxplot

# Read SI table
file <- system.file("extdata/protein/TAW+17_Table_S6_Validated.csv", package = "canprot")
dat <- read.csv(file)
# Keep only proteins with validated location
dat <- dat[dat$Reliability == "Validated", ]
# Keep only proteins with one annotated location
dat <- dat[rowSums(dat[, 4:32]) == 1, ]

# Get the amino acid compositions
aa <- human_aa(dat$Uniprot)
# Put the location into the amino acid data frame
aa$location <- dat$IF.main.protein.location

# Use top locations (and their colors) from Fig. 2B of Thul et al., 2017
locations <- c("Cytosol","Mitochondria","Nucleoplasm","Nucleus","Vesicles","Plasma membrane")
col <- c("#194964", "#2e6786", "#8a2729", "#b2333d", "#e0ce1d", "#e4d71c")
# Keep the proteins in these locations
aa <- aa[aa$location %in% locations, ]
## Keep only proteins with length between 100 and 2000
#aa <- aa[plength(aa) >= 100 & plength(aa) <= 2000, ]

# Get amino acid composition for proteins in each location
# (Loop over groups by piping location names into lapply)
aalist <- lapply(locations, function(location) aa[aa$location == location, ] )

# Setup plot
par(mfrow = c(1, 2))
titles <- c(Zc = "Carbon oxidation state", pI = "Isoelectric point")
# Calculate Zc and pI
for(metric in c("Zc", "pI")) {
  datlist <- lapply(aalist, metric)
  bp <- boxplot(datlist, ylab = cplab[[metric]], col = col, show.names = FALSE)
  add_cld(datlist, bp)
  # Make rotated labels
  x <- (1:6) + 0.1
  y <- par()$usr[3] - 1.5 * strheight("A")
  text(x, y, locations, srt = 25, adj = 1, xpd = NA)
  axis(1, labels = FALSE)
  title(titles[metric], font.main = 1)
}

The plots show carbon oxidation state (Z_C) and isoelectric point (pI) for human proteins in different subcellular locations. The localization data is from Table S6 of Thul et al. (2017), filtered to include proteins that have both a validated location and only one predicted location.

canprot demo #3: Redoxins

Run the demo using demo("redoxins"). For this demo, just the output is shown below.

The canprot functions used are:

read_fasta(): Reads a FASTA sequence file and returns amino acid compositions of proteins. Additional processing is performed by using the following arguments:
- type to read header lines
- iseq to read specific sequences
- start and stop to read segments of the sequences
Zc(): Calculates carbon oxidation state

This is an exploratory analysis for hypothesis generation about evolutionary links between midpoint reduction potential and Z_C of proteins. The reduction potential data was taken from Åslund et al. (1997) and Hirasawa et al. (1999) for E. coli and spinach proteins, respectively. This plot is modified from Fig. 5 of this preprint; the figure did not appear in the published paper.

References

Åslund F, Berndt KD, Holmgren A. 1997. Redox potentials of glutaredoxins and other thiol-disulfide oxidoreductases of the thioredoxin superfamily determined by direct protein-protein redox equilibria. Journal of Biological Chemistry 272(49): 30780–30786. doi: 10.1074/jbc.272.49.30780

Dick JM, Boyer GM, Canovas PA, Shock EL. 2023. Using thermodynamics to obtain geochemical information from genomes. Geobiology 21(2): 262–273. doi: 10.1111/gbi.12532

Hirasawa M, Schürmann P, Jacquot J-P, Manieri W, Jacquot P, Keryer E, Hartman FC, Knaff DB. 1999. Oxidation-reduction properties of chloroplast thioredoxins, ferredoxin:thioredoxin reductase, and thioredoxin f-regulated enzymes. Biochemistry 38(16): 5200–5205. doi: 10.1021/bi982783v

Luo Z-H, Li Q, Xie Y-G, Lv A-P, Qi Y-L, Li M-M, Qu Y-N, Liu Z-T, Li Y-X, Rao Y-Z, et al. 2024. Temperature, pH, and oxygen availability contributed to the functional differentiation of ancient Nitrososphaeria. The ISME Journal 18(1): wrad031. doi: 10.1093/ismejo/wrad031

Thul PJ, Åkesson L, Wiking M, Mahdessian D, Geladaki A, Blal HA, Alm T, Asplund A, Björk L, Breckels LM, et al. 2017. A subcellular map of the human proteome. Science 356(6340): eaal3321. doi: 10.1126/science.aal3321