REMYSOL

Effect of Talaromyces helicus and pollutants on the bacterial composition of soil microcosms

closed
collaboration
Authors
Affiliation

Olivier Rué

Migale bioinformatics facility

Christelle Hennequet-Antier

Migale bioinformatics facility

Published

July 3, 2024

Modified

February 19, 2025

Note

This document is a report of the analyses performed. You will find all the code used to analyze these data. The version of the tools (maybe in code chunks) and their references are indicated, for questions of reproducibility.

Aim of the project

The aim of the project is to study the impact of Talaromyces helicus and benzo[a]pyrene biodegradation on bacterial diversity.

Design

Soil microcosms in columns were carried out under 6 different conditions in triplicate.

3 incubation times were used, and the columns were sampled over several centimetres of soil depth.

Partners

  • Olivier Rué - Migale bioinformatics facility - BioInfomics - INRAE
  • Christelle Hennequet-Antier - Migale bioinformatics facility - BioInfomics - INRAE
  • Salomé Bertone - UTC - Sorbonne-Université

Deliverables

Deliverables agreed at the preliminary meeting (Table 1).

Table 1: Deliverables
  Definition Status
1 HTML report ✔️
2 Study of diversity as a function of the 3 factors (with priority given to those degrading PAHs)
3 Differential abundance between 3 factors - Heatmap
4 Functional analysis

Data management

Important

All data is managed by the migale facility for the duration of the project. Once the project is over, the Migale facility does not keep your data. We will provide you with the raw data and associated metadata that will be deposited on public repositories before the results are used. We can guide you in the submission process. We will then decide which files to keep, knowing that this report will also be provided to you and that the analyses can be replayed if needed.

Raw data

Bioinformatics was performed by Salomé using FROGS with Galaxy.

Load the phyloseq object

BIOM file, tree file and metadata file were downloaded from Galaxy

The phyloseq package [1] is a tool to import, store, analyze, and graphically display complex metabarcoding data, especially when there is associated sample data, phylogenetic tree, and/or taxonomic assignment of the OTUs or ASVs. Various customs functions written to enhance the base functions of phyloseq are available in the phyloseq-extended package [2].

Code
if(!file.exists("html/physeq.rds")){
  physeq <- import_frogs("data/remysol.biom", taxMethod = "blast")
  metadata <- read.table("data/metadata.tsv", row.names = 1, header = TRUE, sep = "\t", stringsAsFactors = FALSE)
  sample_data(physeq) <- metadata
  phy_tree(physeq) <- read_tree("data/remysol.nwk")
  saveRDS(object = physeq, file="html/physeq.rds")
}else{
  physeq <- readRDS("html/physeq.rds")
}

physeq
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2259 taxa and 162 samples ]
sample_data() Sample Data:       [ 162 samples by 6 sample variables ]
tax_table()   Taxonomy Table:    [ 2259 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 2259 tips and 2258 internal nodes ]

Taxonomy modifications

Some taxonomic information is not enough precise and need to be modified. This is related to species containing PAH-ring hydroxylating dioxygenases.

Code
#list taxa of interest: containing PAH-ring hydroxylating dioxygenases
listPAH <- read.xlsx("./data/list_Species_PAH-ring.xlsx")

The genus of this ASV was completed.

Code
#manually curated
tax_table(physeq)["Cluster_145", "Genus"] <- "Terrabacter"
tax_table(physeq)["Cluster_145",]
Taxonomy Table:     [1 taxa by 7 taxonomic ranks]:
            Kingdom    Phylum             Class            Order          
Cluster_145 "Bacteria" "Actinobacteriota" "Actinobacteria" "Micrococcales"
            Family               Genus         Species          
Cluster_145 "Intrasporangiaceae" "Terrabacter" "Terrabacter sp."

Biostatistics

Filtering

Two samples named blanc and T0B122PMF1.3 (without design experimental information) were removed. Samples from Soil + B(a)P 13C and Soil + B(a)P 13C + T. helicus were not studied.

Code
physeq <- subset_samples(physeq, sample_names(physeq) %notin% c("blanc", "T0B122PMF1.3"))
physeq <- physeq %>% 
  microViz::ps_filter(Condition != "Soil + B(a)P 13C ") %>%
  microViz::ps_filter(Condition != "Soil + B(a)P 13C + T. helicus")
Code
sample_data(physeq) <- sample_data(physeq) %>%
  as_tibble() %>% 
  mutate(
    RowName = as.character(SampleName),
    SampleName = as.character(SampleName),
    ConditionName = as.character(ConditionName),
    Condition = factor(Condition, 
                       levels = c("Soil", "Soil + B(a)P 12C ", "Soil + B(a)P 13C ", "Soil + T. helicus", "Soil + B(a)P 12C + T. helicus",  "Soil + B(a)P 13C + T. helicus"), 
                       ordered = TRUE),
    Deep = factor(Deep, 
                  levels = c("0-1,25 cm ± 0,25 ", "1,25-2,5 cm ± 0,25 ", "2,5-3,5 cm ± 0,25 "),
                  ordered = TRUE),
    Replicat = as.character(Replicat),
    Incubation = factor(Incubation,
                        levels = c("0", "5", "9"),
                        ordered = TRUE)
  )  %>%
column_to_rownames(var="RowName")

physeq <- physeq %>% microViz::ps_arrange(Condition, Incubation, Deep)
physeq 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2258 taxa and 106 samples ]
sample_data() Sample Data:       [ 106 samples by 6 sample variables ]
tax_table()   Taxonomy Table:    [ 2258 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 2258 tips and 2257 internal nodes ]

Useful informations about sequencing from physeqobject.

Code
microbiome::summarize_phyloseq(physeq)
[[1]]
[1] "1] Min. number of reads = 23891"

[[2]]
[1] "2] Max. number of reads = 93584"

[[3]]
[1] "3] Total number of reads = 5039267"

[[4]]
[1] "4] Average number of reads = 47540.2547169811"

[[5]]
[1] "5] Median number of reads = 46606.5"

[[6]]
[1] "7] Sparsity = 0.177757908986079"

[[7]]
[1] "6] Any OTU sum to 1 or less? NO"

[[8]]
[1] "8] Number of singletons = 0"

[[9]]
[1] "9] Percent of OTUs that are singletons \n        (i.e. exactly one read detected across all samples)0"

[[10]]
[1] "10] Number of sample variables are: 6"

[[11]]
[1] "SampleName"    "ConditionName" "Condition"     "Deep"         
[5] "Replicat"      "Incubation"   

Experimental design

Code
# exprimental design table 
sample_data(physeq)%>% 
  datatable()

Number of replicates per condition

Code
# count number of samples per condition
sample_data(physeq) %>% 
  as_tibble() %>% 
  dplyr::count(Condition) 
# A tibble: 4 × 2
  Condition                           n
  <ord>                           <int>
1 "Soil"                             27
2 "Soil + B(a)P 12C "                26
3 "Soil + T. helicus"                27
4 "Soil + B(a)P 12C + T. helicus"    26
Code
# library(magrittr)
# sample_data(physeq) %$%
#   table(ConditionName, Incubation, Deep)

sample_data(physeq) %>% 
  as_tibble() %>% 
  dplyr::add_count(Condition) %>%
  dplyr::select(Condition, n) %>%
  distinct() %>%
  ggplot(., aes(x = Condition, y = n, fill = Condition)) + 
  geom_bar(stat = "identity") +
  labs(title="Number of replicates", y = "n") +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(angle = 90, size = 8, hjust = 1))

Rarefaction

The common practice for comparing microbiome samples of different library sizes is to reduce the size of the larger samples using rarefaction until they contain the same number of reads as the smallest sample. The rarefaction curves show the species richness (i.e. number of species) against the library size and the vertical grey line indicates the smallest sample’s size. Each curve indicates when the sampling depth is sufficient to achieve an acceptable number of species discoveries and when increasing the depth leads to the discovery of rarer species. Rarefied data are used to estimate and compare alpha and beta diversity across samples.

Code
phyloseq::plot_bar(physeq, x = "SampleName", fill = "Phylum") + 
  facet_grid(~ Condition, scales = "free_x", space = "free") +
  theme(axis.text.x = element_text(size = 6, hjust = 1)) +
  theme(legend.position="top")

Remove samples with low sequencing.

Code
# sample_sums(physeq) %>% sort() %>% head(n=5)
# remove samples with less than nbreads_threshold 
nbreads_threshold  <- 30000
sample_sums(physeq)[!sample_sums(physeq) > nbreads_threshold]
T4B123P18F1.3 
        23891 
Code
physeq <- microViz::ps_filter(physeq, sample_sums(physeq) > nbreads_threshold)
Code
ggrare(physeq = physeq, step = 500, se = FALSE, color = "Condition", plot = FALSE, label = "SampleName", verbose = FALSE) + 
  # scale_colour_manual(values = group_pal) +
  geom_vline(xintercept = c(min(sample_sums(physeq))), color = "gray60") + 
  ggtitle("Rarefaction curves")

Code
physeq_rare <- phyloseq::rarefy_even_depth(physeq, rngseed = 2009, replace = TRUE)
Code
phyloseq::plot_bar(physeq_rare, fill="Phylum") + 
  facet_grid(~ Condition, scales = "free_x", space = "free") +
  theme(axis.text.x = element_text(size = 6, hjust = 1)) +
  theme(legend.position="top")

Results

One sample was removed due to a low number of reads, less than 30000.

After rarefaction, all depths are equal to 32306. The dataset contains 3392130 sequences allocated to 2258 taxa and 105 samples.

Taxonomic composition

Code
phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Kingdom", taxaSet1 = "Bacteria", taxaRank2 = "Phylum", numberOfTaxa = 10L) +
  facet_grid(~ Condition, scales = "free_x", space = "free") +
  theme(axis.text.x = element_text(size = 6, hjust = 1)) +
  theme(legend.position="top")

Results

Firmicutes, Actinobacteriota, Proteobacteria and Acidobacteriota are dominants in major samples.

Focus on the genus rank across these major phyla.

Code
phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Phylum", taxaSet1 = "Firmicutes", taxaRank2 = "Genus") +
  facet_grid( ~ Condition, scales = "free_x", space = "free_x") + 
  labs(title = "Relative composition", subtitle = "Firmicutes")
Problematic taxa
                   taxa  Kingdom     Phylum   Class      Order      Family
Cluster_519 Cluster_519 Bacteria Firmicutes Bacilli Bacillales Bacillaceae
                        Genus rank
Cluster_519 Multi-affiliation    6

Bacillus, Tumebacillus genera are mainly representated in Firmicutes.

Code
phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Phylum", taxaSet1 = "Actinobacteriota", taxaRank2 = "Genus") +
  facet_grid( ~ Condition, scales = "free_x", space = "free_x") + 
  labs(title = "Relative composition", subtitle = "Actinobacteriota")
Problematic taxa
                 taxa  Kingdom           Phylum           Class
Cluster_2   Cluster_2 Bacteria Actinobacteriota  Actinobacteria
Cluster_8   Cluster_8 Bacteria Actinobacteriota Thermoleophilia
Cluster_84 Cluster_84 Bacteria Actinobacteriota Thermoleophilia
Cluster_10 Cluster_10 Bacteria Actinobacteriota       MB-A2-108
                         Order         Family             Genus rank
Cluster_2        Micrococcales Micrococcaceae Multi-affiliation    2
Cluster_8           Gaiellales        Unknown           Unknown    4
Cluster_84 Solirubrobacterales          67-14           Unknown    6
Cluster_10             Unknown        Unknown           Unknown    7

The major genera in Actinobacteriota are Nocardioides, Mycobacterium, Rhodococcus, Gaiella and streptomyces depending on samples and conditions.

Code
phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Phylum", taxaSet1 = "Proteobacteria", taxaRank2 = "Genus") +
  facet_grid( ~ Condition, scales = "free_x", space = "free_x") + 
  labs(title = "Relative composition", subtitle = "Proteobacteria")
Problematic taxa
                 taxa  Kingdom         Phylum               Class       Order
Cluster_5   Cluster_5 Bacteria Proteobacteria Alphaproteobacteria Rhizobiales
Cluster_13 Cluster_13 Bacteria Proteobacteria Alphaproteobacteria Rhizobiales
Cluster_12 Cluster_12 Bacteria Proteobacteria Alphaproteobacteria Rhizobiales
                       Family             Genus rank
Cluster_5   Xanthobacteraceae           Unknown    2
Cluster_13 Methyloligellaceae           Unknown    3
Cluster_12  Xanthobacteraceae Multi-affiliation    5

The most representative known genera are Sphingomonas and Lysobacter depending on samples and conditions.

Code
phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Phylum", taxaSet1 = "Acidobacteriota", taxaRank2 = "Genus") +
  facet_grid( ~ Condition, scales = "free_x", space = "free_x") + 
  labs(title = "Relative composition", subtitle = "Acidobacteriota")
Problematic taxa
                   taxa  Kingdom          Phylum            Class
Cluster_58   Cluster_58 Bacteria Acidobacteriota Vicinamibacteria
Cluster_67   Cluster_67 Bacteria Acidobacteriota Vicinamibacteria
Cluster_499 Cluster_499 Bacteria Acidobacteriota       Holophagae
Cluster_366 Cluster_366 Bacteria Acidobacteriota   Blastocatellia
Cluster_126 Cluster_126 Bacteria Acidobacteriota   Blastocatellia
                         Order              Family   Genus rank
Cluster_58  Vicinamibacterales             Unknown Unknown    1
Cluster_67  Vicinamibacterales Vicinamibacteraceae Unknown    2
Cluster_499         Subgroup 7             Unknown Unknown    5
Cluster_366              11-24             Unknown Unknown    7
Cluster_126   Blastocatellales   Blastocatellaceae Unknown    8

We note the presence of RB41, Bryobacter, Candidatus Solibacter and Candidatus Koribacter in Acidobacteriota phylum.

Alpha-diversity

Alpha-diversity represents diversity within a sample. The most commonly used measures of diversity are : Observed, Chao1, Shannon, and Simpson. Richness (i.e. Observed) is simply the number of species present in the sample. Chao1 is an estimator of the number of species applied to the presence/absence data for each species. Shannon uses the abondance of different species and indicates richness and evenness. The Simpson index gives more importance to abundant species.

Code
# plot alpha diversity
phyloseq::plot_richness(physeq = physeq_rare, x = "Condition", color= "Incubation", title = "Alpha diversity graphics", measures = c("Observed", "Chao1", "Shannon", "Simpson", "InvSimpson"), nrow=5) + 
  geom_jitter() +
  scale_fill_brewer(palette = "Paired") +
  theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
  theme(legend.position="top")

Code
# plot alpha diversity
phyloseq::plot_richness(physeq = physeq_rare, x = "Condition", color= "Deep", title = "Alpha diversity graphics", measures = c("Observed", "Chao1", "Shannon", "Simpson", "InvSimpson"), nrow=5) + 
  geom_jitter() +
  scale_fill_brewer(palette = "Paired") +
  theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
  theme(legend.position="top")

Code
# plot alpha diversity 
p.alphadiversity <- microbiome::boxplot_alpha(physeq_rare, 
                           index = c("Observed"),
                           x_var = "Condition",
                           )
p.alphadiversity +
    facet_grid(cols = vars(Incubation, Deep), scales = "free_x", space = "free_x") + 
    theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
    theme(legend.position="top")

Code
# plot alpha diversity 
p.alphadiversity <- microbiome::boxplot_alpha(physeq_rare, 
                           index = c("Chao1"),
                           x_var = "Condition",
                           )
p.alphadiversity +
    facet_grid(cols = vars(Incubation, Deep), scales = "free_x", space = "free_x") + 
    theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
    theme(legend.position="top")

Code
# plot alpha diversity 
p.alphadiversity <- microbiome::boxplot_alpha(physeq_rare, 
                           index = c("Shannon"),
                           x_var = "Condition",
                           )
p.alphadiversity +
    facet_grid(cols = vars(Incubation, Deep), scales = "free_x", space = "free_x") + 
    theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
    theme(legend.position="top")

Code
# plot alpha diversity 
p.alphadiversity <- microbiome::boxplot_alpha(physeq_rare, 
                           index = c("Simpson"),
                           x_var = "Condition",
                           )
p.alphadiversity +
    facet_grid(cols = vars(Incubation, Deep), scales = "free_x", space = "free_x") + 
    theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
    theme(legend.position="top")

Results

The soil samples collected at Incubation = 0 contain the highest homogeneous richness. The richness of the samples collected at Incubation = 9 and with a high value of Deep = 2,5-3,5 cm ± 0,25, is as high as at Incubation = 0. Some samples had low alpha diversity leading to high variablity for Condition = Soil + B(a)P 12C, incubation = 5, Deep = 2,5-3,5 cm ± 0,25, Condition = Soil + T. helicus, incubation = 5, Deep = 2,5-3,5 cm ± 0,25 and Condition = Soil + B(a)P 12C + T. helicus, incubation = 9, Deep = 1,25-2,5 cm ± 0,25.

Alpha-diversity - statistical tests

We calculated the alpha-diversity indices and combined them with covariates from experimental design. A linear model is fitted to test the influence of each factor and the interactions on the alpha-diversity (here, Observed measure).

Code
div_data <- cbind(estimate_richness(physeq_rare),  ## diversity indices
                  sample_data(physeq_rare)         ## covariates
                  )

Model with interaction

Code
alphadiv.lm <- lm(Observed ~ Condition*Incubation*Deep, data = div_data)
anova(alphadiv.lm)
Analysis of Variance Table

Response: Observed
                          Df  Sum Sq Mean Sq F value   Pr(>F)   
Condition                  3  474300  158100  3.9452 0.011720 * 
Incubation                 2  577015  288507  7.1994 0.001446 **
Deep                       2  311336  155668  3.8845 0.025199 * 
Condition:Incubation       6  758814  126469  3.1559 0.008566 **
Condition:Deep             6  136854   22809  0.5692 0.753453   
Incubation:Deep            4  827291  206823  5.1610 0.001071 **
Condition:Incubation:Deep 12  555368   46281  1.1549 0.332695   
Residuals                 69 2765103   40074                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
# interaction visualisation
emmip(alphadiv.lm, Condition ~ Incubation, CIs = TRUE) + theme(axis.text.x = element_text(angle=45, hjust=1))

Code
emmip(alphadiv.lm, Condition ~ Deep, CIs = TRUE) + theme(axis.text.x = element_text(angle=45, hjust=1))

Code
emmip(alphadiv.lm, Incubation ~ Deep, CIs = TRUE) + theme(axis.text.x = element_text(angle=45, hjust=1))

Code
#tests 
EMM <- emmeans(alphadiv.lm, ~ Condition*Incubation*Deep)
# tests correction are performed within the variable
# P value adjustment: tukey method for comparing  
comp1 <- pairs(EMM, simple = "Condition",  adjust = "tukey")
# P value adjustment: tukey method for comparing 
comp2 <- pairs(EMM, simple = "Incubation",  adjust = "tukey")
# P value adjustment: tukey method for comparing 
comp3 <- pairs(EMM, simple = "Deep",  adjust = "tukey")
Results

The richness (i.e. Observed index) is influenced by the interaction Condition:Incubation, Incubation:Deep. The simple effects of Condition, Incubation and Deep must be interpret with the interaction. Plots allow to visualize significant interactions at 0.05 Condition:Incubation and Incubation:Deep on rhe estimated marginal means fitted of the observed alpha-diversity. The interaction Condition:Deep is not significant but a lower diversity is shown for Soil + B(a)P 12C + T. helicus samples.

Pairwise tests between Condition and Incubation were performed with post hoc Tukey’s test to determine which were significant. These comparisons were significant at 0.05 (See downloads file):

  • Incubation = 5, Deep = 2,5-3,5 cm ± 0,25: Soil - (Soil + B(a)P 12C + T. helicus)
  • Soil + B(a)P 12C + T. helicus, Deep = 2,5-3,5 cm ± 0,25: Incubation0 - Incubation5
  • Soil + B(a)P 12C + T. helicus, Deep = 2,5-3,5 cm ± 0,25 : Incubation5 - Incubation9

Beta-diversity

The dissimilarity between samples based on taxonomic composition is calculated with one of these Beta diversities: Bray-Curtis, Unweighted and weighted Unifrac, Jaccard index. The Bray-Curtis dissimilarity is calculated on counts and takes the value 0 when the composition of the samples est identical and the value 1 when the samples are completely disjoint. The Jaccard index uses presence/absence information only whereas UniFrac integrates information from the phylogenetic tree.

Code
dist.jac <- phyloseq::distance(physeq_rare, method = "cc")
dist.bc <- phyloseq::distance(physeq_rare, method = "bray")
dist.uf <- phyloseq::distance(physeq_rare, method = "unifrac")
dist.wuf <- phyloseq::distance(physeq_rare, method = "wunifrac")
distance_list <- list(
  "Jaccard"     = dist.jac,
  "Bray-Curtis" = dist.bc,
  "UniFrac"     = dist.uf,
  "wUniFrac"    = dist.wuf
)
Code
# Mahendra's function
# used own color's palette and shape
my_plot3 <- function(variable1, variable2, physeq = mydata_rare, shapeval, colorpal) {
  .plot <- function(distance) {
      .distance <- distance_list[[distance]] %>% as.matrix() %>% `[`(sample_names(physeq), sample_names(physeq)) %>% as.dist()
    p <- plot_ordination(physeq,
                    ordinate(physeq, method = "MDS", distance = .distance),
                    color = variable1, shape = variable2) + 
      scale_shape_manual(values = shapeval) +
      scale_color_manual(name = colorpal$name,
                         labels = colorpal$labels,
                         values = colorpal$values
                     ) +
      theme(legend.position = "bottom") +
      ggtitle(distance)
    # print(p)
  }
  plots <- lapply(names(distance_list), .plot)
 # legend <- cowplot::get_legend(plots[[1]] + theme(legend.position = "none"))
  shared_legend <- cowplot::get_plot_component(plots[[1]], "guide-box", return_all = TRUE)[[3]]
  plots <- lapply(plots, function(x){x + theme(legend.position = "none")})
  pgrid <- cowplot::plot_grid(plotlist = plots)
  cowplot::plot_grid(pgrid, shared_legend, ncol = 1, rel_heights = c(1, .1)) 
}
  
# Mahendra's function
# used own color's palette and shape
my_pcoa_with_arrows2 <- function(physeq = mydata_rare, variable1, variable2, shapeval, colorpal) {
  # select samples of interest where all samples were used to calculate the distances between samples 
  # dist.jac is previously calculated with all samples of the study
  #dist.c <- phyloseq::distance(physeq, "jaccard")
  dist.c <- dist.bc %>% as.matrix() %>% `[`(sample_names(physeq), sample_names(physeq)) %>% as.dist()
  ord.c <- ordinate(physeq, "MDS", dist.bc)
  p <- plot_ordination(physeq, ord.c)
  # ## Build arrow data
  plot_data <- p$data 
  arrow_data <- plot_data %>% 
    group_by(pick({{variable2}}, {{variable1}})) %>%
    arrange({{variable1}}) %>%
    mutate(xend = Axis.1, xstart = lag(xend),
           yend = Axis.2, ystart = lag(yend)
    )
  ## Plot with arrows
  p + aes(shape = .data[[ variable2  ]], color = .data[[ variable1  ]]) +
      scale_shape_manual(values = shapeval) +
      scale_color_manual(name = colorpal$name,
                         labels = colorpal$labels,
                         values = colorpal$values) +
    theme_bw() +
    geom_segment(data = arrow_data,
                 aes(x = xstart, y = ystart, xend = xend, yend = yend),
                 linewidth = 0.8, linetype = "3131",
                 #arrow = arrow(length=unit(0.1,"inches")),
                 show.legend = FALSE) +
    geom_point(size = 3) +
    theme(legend.position = "bottom")
}
Code
# MDS plot for samples of interests 
# from distance calculated with all samples of the complete study
# used https://colorbrewer2.org/#type=qualitative&scheme=Paired&n=12
Condition_pal <- list(
  name = "Condition",
  labels = c("Soil",  "Soil + B(a)P 12C" ,  "Soil + T. helicus",  "Soil + B(a)P 12C + T. helicus"),
  values = c("#A0522D", "#CDAA7D", "#CD853F", "#EEAD0E")
)
Incubation_pal <- list(
  name = "Incubation",
  labels = c("0", "5", "9"),
  values = c("#9ecae1", "#4292c6", "#08519c")
)
Deep_pal <- list(
  name = "Deep",
  labels = c('0-1,25 cm ± 0,25', '< 1,25-2,5 cm ± 0,25', '< 2,5-3,5 cm ± 0,25'),
  values = c( "#B3EE3A", "#9ACD32", "#458B00")
)  
  
Deep_shape <- list(values = c(1,2,5))
Incubation_shape <- list(values = c(16,17,15))
Condition_shape <- list(values = c(1,16,17,15))  
Code
# MDS plot
my_plot3(variable1 = "Incubation",
        variable2 = "Condition",
        physeq = physeq_rare,
        shapeval = Condition_shape$values,
        colorpal = Incubation_pal
        )

Code
# MDS plot
my_plot3(variable1 = "Incubation",
        variable2 = "Deep",
        physeq = physeq_rare,
        shapeval = Deep_shape$values,
        colorpal = Incubation_pal
        ) 

Code
# MDS plot
my_plot3(variable1 = "Condition",
        variable2 = "Incubation",
        physeq = physeq_rare,
        shapeval = Incubation_shape$values,
        colorpal = Condition_pal
        )

Code
# MDS plot
my_plot3(variable1 = "Condition",
        variable2 = "Deep",
        physeq = physeq_rare,
        shapeval = Deep_shape$values,
        colorpal = Condition_pal
        )  

Code
# MDS plot
my_plot3(variable1 = "Deep",
        variable2 = "Condition",
        physeq = physeq_rare,
        shapeval = Condition_shape$values,
        colorpal = Deep_pal
        )  

Here, we focused on the Multidimensional scaling (MDS / PCoA) based on Bray-Curtis distance and highlighted the effects of two significant interactions.

Code
# MDS plot with arrows
# effet le plus significatif est incubation
p <- my_pcoa_with_arrows2(variable1 = "Incubation",
        variable2 = "Condition",
        physeq = physeq_rare,
        shapeval = Condition_shape$values,
        colorpal = Incubation_pal)
p

Code
# MDS plot with arrows
# effet le plus significatif est incubation
p <- my_pcoa_with_arrows2(variable1 = "Incubation",
        variable2 = "Deep",
        physeq = physeq_rare,
        shapeval = Deep_shape$values,
        colorpal = Incubation_pal)
p

MDS plot

The Multidimensional scaling (MDS / PCoA) based on Bray-Curtis distance showed that samples were firstly structured according to Incubation as the most significant variable in the alpha-diversity test. Different graphs help to visualize the effects of the interactions between Condition, Incubation and Deep.

Beta-diversity - Tests

Code
metadata <- sample_data(physeq_rare) %>% as("data.frame")
# model <- vegan::adonis2(dist.bc ~ Condition*Incubation*Deep, data = metadata, permutations = 9999)
# print(model)
model <- vegan::adonis2(dist.bc ~ Incubation + Condition + Deep + Incubation:Condition + Incubation:Deep + Condition:Deep, data = metadata, permutations = 9999)
print(model)
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 9999

vegan::adonis2(formula = dist.bc ~ Incubation + Condition + Deep + Incubation:Condition + Incubation:Deep + Condition:Deep, data = metadata, permutations = 9999)
          Df SumOfSqs      R2      F Pr(>F)    
Model     23   8.9447 0.60512 5.3967  1e-04 ***
Residual  81   5.8370 0.39488                  
Total    104  14.7817 1.00000                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Results

The change in microbiota composition measured by the Bray-Curtis Beta diversity is not significant for the interaction Condition*Incubation*Deep (not shown). So, the model was fitted without it. All effects were significant except for Condition:Deep at 0.05 significance level.

Sample clustering

Code
par(mar = c(3, 0, 2, 0), cex = 0.6)
phyloseq.extended::plot_clust(physeq_rare, dist = "bray", method = "ward.D2", color = "Incubation",
           title = "Clustering of samples (Bray-Curtis + Ward)")

Code
par(mar = c(3, 0, 2, 0), cex = 0.6)
phyloseq.extended::plot_clust(physeq_rare, dist = "bray", method = "ward.D2", color = "Condition",
           title = "Clustering of samples (Bray-Curtis + Ward)")

Code
par(mar = c(3, 0, 2, 0), cex = 0.6)
phyloseq.extended::plot_clust(physeq_rare, dist = "bray", method = "ward.D2", color = "Deep",
           title = "Clustering of samples (Bray-Curtis + Ward)")

Results

Hierarchical clustering based on the Bray-Curtis dissimilarity and the Ward aggregation criterion shows that sample composition is not entirely structured by one of the factors Condition, Incubation and Deep due to the interactions. However, as previous results have shown, structuring is best explained by Incubation.

Heatmap using ordination

At Phylum taxonomic levels, we created heatmap plot of abundances using ordination methods to organize the rows with Bray-Curtis dissimilarity and Multidimensional scaling (MDS).

Code
#  method = NULL if not ordination
p <- phyloseq::plot_heatmap(tax_glom(physeq_rare, taxrank="Phylum"), 
                  method = "MDS",
                  distance = "bray",
                  sample.order= "SampleName", # not ordination
                  taxa.label = "Phylum") + 
  facet_grid(~ Incubation + Condition, scales = "free_x", space = "free_x") + 
  scale_fill_gradient2(low = "#1a9850", mid = "#ffffbf", high = "#d73027",
                       na.value = "white", trans = scales::log_trans(4),
                       midpoint = log(100, base = 4))
p

Differential abundance analysis

The abundances data was not rarefied to performed differential abundance analysis. We are interested to identify taxa that were differentially abundant between Condition at Deepand Incubation fixed.

For convenience, the levels Soil, Soil + B(a)P 12C, Soil + T. helicus, Soil + B(a)P 12C + T. helicus, of the Condition factor have been relabelled as Soil, Soil_12C, Soil_heli, Soil_12C_heli, respectively. And the levels of 0-1,25 cm ± 0,25, 1,25-2,5 cm ± 0,25, 2,5-3,5 cm ± 0,25 of Deep factor have been relabelled as L, M, H, respectively.

Code
physeqbis <- physeq

sample_data(physeqbis) <- sample_data(physeqbis) %>%
  as_tibble() %>% 
  mutate(
    RowName = as.character(SampleName),
    Condition = factor(Condition, 
                       levels = c("Soil", "Soil + B(a)P 12C ", "Soil + T. helicus", "Soil + B(a)P 12C + T. helicus"), 
                       labels = c("Soil", "Soil_12C", "Soil_heli", "Soil_12C_heli"),
                       ordered = TRUE),
    Deep = factor(Deep, 
                  levels = c("0-1,25 cm ± 0,25 ", "1,25-2,5 cm ± 0,25 ", "2,5-3,5 cm ± 0,25 "),
                  labels = c("L", "M", "H"),
                  ordered = TRUE),
    Replicat = as.character(Replicat),
    Incubation = factor(Incubation,
                        levels = c("0", "5", "9"),
                        ordered = TRUE),
    Group = factor(paste(Deep, Incubation, Condition, sep = "_"))
  )  %>%
column_to_rownames(var="RowName")

physeqbis <- physeqbis %>% microViz::ps_arrange(Group)
physeqbis 
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2258 taxa and 105 samples ]
sample_data() Sample Data:       [ 105 samples by 7 sample variables ]
tax_table()   Taxonomy Table:    [ 2258 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 2258 tips and 2257 internal nodes ]

Calculate the minimum number of replicates per Group.

Code
# calculate the minimum number of replicates
minreplicates <- sample_data(physeqbis) %>% 
  as_tibble() %>% 
  dplyr::add_count(Group) %>%
  dplyr::select(Group, n) %>%
  dplyr::summarise( min_nbreplicates = min(n))
minreplicates
# A tibble: 1 × 1
  min_nbreplicates
             <int>
1                2
Code
# from Mahendra's functions
# useful functions to perform a differential abundance analysis (daa)
# filter taxa all zeros in all samples and 
# keep taxa with counts > 10 in at least the number of replicates - possibility to choose
my_filter_phyloseq <- function(physeq, nreplicates = minreplicates) {
    ii <- phyloseq::genefilter_sample(physeq, function(x) x > 10, nreplicates)
  phyloseq::prune_taxa(ii, physeq)
}

# realise differential abundance results (daa) for the contrast
# Note to specify a contrast as a list
my_extract_results <- function(deseq, test = "Wald", contrast) {
  DESeq2::results(deseq, test = test, tidy = TRUE, contrast = contrast)  %>%
    dplyr::rename(ASV = row) %>% 
 #   dplyr::filter(padj <= 0.05) %>% 
    dplyr::arrange(log2FoldChange)
}

# formatting the daa result by adding ASV informations into daa object
my_format_results <- function(data, physeq) {
  phyloseq::tax_table(physeq)@.Data %>% as_tibble() %>% mutate(ASV = rownames(phyloseq::tax_table(physeq)@.Data)) %>% 
    dplyr::right_join(., data, by = "ASV") %>% 
    mutate(ASV = factor(ASV, levels = data$ASV))
}

# combine my_extract_results and my_format_results into an unique function
# realise differential abundance results (daa) for the contrast
# Note to specify a contrast as a list
my_daa_results <- function(deseq, test = "Wald", physeq, contrast) {
  # provide differential abundance results (daa) for the contrast
  # sort by log2FoldChange
  daa <- DESeq2::results(deseq, test = test, tidy = TRUE, contrast = contrast)  %>%
    dplyr::rename(ASV = row) %>% 
    dplyr::filter(padj <= 0.05) 
  # add ASV informations into daa object
    daa_new <- phyloseq::tax_table(physeq)@.Data %>% as_tibble() %>% mutate(ASV = rownames(phyloseq::tax_table(physeq)@.Data)) %>% 
    dplyr::right_join(., daa, by = "ASV") %>% 
    mutate(ASV = factor(ASV, levels = daa$ASV)) %>%
    dplyr::arrange(log2FoldChange)
    return(daa_new)
}

Abundances were agglomerated at Genus taxa rank.

Code
physeq_filter <- physeqbis %>% 
  phyloseq::tax_glom(., taxrank="Genus")  %>% 
  my_filter_phyloseq(., nreplicates = 2)
physeq_filter
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 373 taxa and 105 samples ]
sample_data() Sample Data:       [ 105 samples by 7 sample variables ]
tax_table()   Taxonomy Table:    [ 373 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 373 tips and 372 internal nodes ]
Differential abundance analysis

The differential abundance analysis (DAA) is based on a Generalized Linear Model (GLM) to take into account the distribution of count-type data (non-normality of the data) and over-dispersion. The regression coefficients of the factors are estimated in the model and Wald test was used to identify DAA ASV’s in a defined contrast.

Using DESeq2 R package, the GLM model [3] was fitted with one factor combining the factors of interest. A Benjamini-Hochberg false discovery rate (FDR) correction [4] was applied on pvalues.

Code
# do not rarefy
# at defined level taxrank
# filtering low abundance: less 10 in at least nreplicates
cds <- phyloseq::phyloseq_to_deseq2(physeq_filter,
                                    ~ 0 + Group)
SummarizedExperiment::rowData(cds) <- phyloseq::tax_table(physeq_filter)
#rownames(SummarizedExperiment::colData(cds)) <- SummarizedExperiment::colData(cds)$SampleLabel

# cds is a summarized experiment object SE
# experimental design
# colData(cds)
# abundances
# assays(cds)$counts
# fit Negative Binomial GLM model
dds <- DESeq2::DESeq(cds, sfType = "poscounts", fitType='local', test="Wald")
dds_coeff <- DESeq2::resultsNames(dds)

Differential abundance analysis between Condition

Code
# define interesting contrasts
contrast_Condition <- list(
"H0_Soil_12C_vs_Soil" = c("GroupH_0_Soil_12C", "GroupH_0_Soil"),
"H0_Soil_12C_heli_vs_Soil_heli" = c("GroupH_0_Soil_12C_heli", "GroupH_0_Soil_heli"),
"H0_Soil_heli_vs_Soil" = c("GroupH_0_Soil_heli", "GroupH_0_Soil"),
"H0_Soil_12C_heli_vs_Soil_12C" = c("GroupH_0_Soil_12C_heli", "GroupH_0_Soil_12C"),

"H5_Soil_12C_vs_Soil" = c("GroupH_5_Soil_12C", "GroupH_5_Soil"),
"H5_Soil_12C_heli_vs_Soil_heli" = c("GroupH_5_Soil_12C_heli", "GroupH_5_Soil_heli"),
"H5_Soil_heli_vs_Soil" = c("GroupH_5_Soil_heli", "GroupH_5_Soil"),
"H5_Soil_12C_heli_vs_Soil_12C" = c("GroupH_5_Soil_12C_heli", "GroupH_5_Soil_12C"),

"H9_Soil_12C_vs_Soil" = c("GroupH_9_Soil_12C", "GroupH_9_Soil"),
"H9_Soil_12C_heli_vs_Soil_heli" = c("GroupH_9_Soil_12C_heli", "GroupH_9_Soil_heli"),
"H9_Soil_heli_vs_Soil" = c("GroupH_9_Soil_heli", "GroupH_9_Soil"),
"H9_Soil_12C_heli_vs_Soil_12C" = c("GroupH_9_Soil_12C_heli", "GroupH_9_Soil_12C"),

"M0_Soil_12C_vs_Soil" = c("GroupM_0_Soil_12C", "GroupM_0_Soil"),
"M0_Soil_12C_heli_vs_Soil_heli" = c("GroupM_0_Soil_12C_heli", "GroupM_0_Soil_heli"),
"M0_Soil_heli_vs_Soil" = c("GroupM_0_Soil_heli", "GroupM_0_Soil"),
"M0_Soil_12C_heli_vs_Soil_12C" = c("GroupM_0_Soil_12C_heli", "GroupM_0_Soil_12C"),

"M5_Soil_12C_vs_Soil" = c("GroupM_5_Soil_12C", "GroupM_5_Soil"),
"M5_Soil_12C_heli_vs_Soil_heli" = c("GroupM_5_Soil_12C_heli", "GroupM_5_Soil_heli"),
"M5_Soil_heli_vs_Soil" = c("GroupM_5_Soil_heli", "GroupM_5_Soil"),
"M5_Soil_12C_heli_vs_Soil_12C" = c("GroupM_5_Soil_12C_heli", "GroupM_5_Soil_12C"),

"M9_Soil_12C_vs_Soil" = c("GroupM_9_Soil_12C", "GroupM_9_Soil"),
"M9_Soil_12C_heli_vs_Soil_heli" = c("GroupM_9_Soil_12C_heli", "GroupM_9_Soil_heli"),
"M9_Soil_heli_vs_Soil" = c("GroupM_9_Soil_heli", "GroupM_9_Soil"),
"M9_Soil_12C_heli_vs_Soil_12C" = c("GroupM_9_Soil_12C_heli", "GroupM_9_Soil_12C"),

"L0_Soil_12C_vs_Soil" = c("GroupL_0_Soil_12C", "GroupL_0_Soil"),
"L0_Soil_12C_heli_vs_Soil_heli" = c("GroupL_0_Soil_12C_heli", "GroupL_0_Soil_heli"),
"L0_Soil_heli_vs_Soil" = c("GroupL_0_Soil_heli", "GroupL_0_Soil"),
"L0_Soil_12C_heli_vs_Soil_12C" = c("GroupL_0_Soil_12C_heli", "GroupL_0_Soil_12C"),

"L5_Soil_12C_vs_Soil" = c("GroupL_5_Soil_12C", "GroupL_5_Soil"),
"L5_Soil_12C_heli_vs_Soil_heli" = c("GroupL_5_Soil_12C_heli", "GroupL_5_Soil_heli"),
"L5_Soil_heli_vs_Soil" = c("GroupL_5_Soil_heli", "GroupL_5_Soil"),
"L5_Soil_12C_heli_vs_Soil_12C" = c("GroupL_5_Soil_12C_heli", "GroupL_5_Soil_12C"),

"L9_Soil_12C_vs_Soil" = c("GroupL_9_Soil_12C", "GroupL_9_Soil"),
"L9_Soil_12C_heli_vs_Soil_heli" = c("GroupL_9_Soil_12C_heli", "GroupL_9_Soil_heli"),
"L9_Soil_heli_vs_Soil" = c("GroupL_9_Soil_heli", "GroupL_9_Soil"),
"L9_Soil_12C_heli_vs_Soil_12C" = c("GroupL_9_Soil_12C_heli", "GroupL_9_Soil_12C")
)


# realize a daa for a list of contrasts
# list of differential analyses results for each contrast
# daa_deseq2_Condition <- contrast_Condition |>
#   map(\(x) my_extract_results(dds, contrast = list(x)))
# daa_deseq2_Condition

# extract result for a fixed contrast
# daa_deseq2 <- my_extract_results(dds, contrast = list(contrast_Condition[[1]]))
# format dda result for a fixed contrast
# daa_res <- my_format_results(data = daa_deseq2, physeq = physeq_filter)

# combine the previous functions
# realize a differential abunbdance analyses (dda) for a list of contrasts
# format dda: adding ASV informations and sorting by log2FoldChange
daa_deseq2_Condition <- contrast_Condition |>
  map(\(x) my_daa_results(dds, test = "Wald", physeq = physeq_filter, contrast = list(x))) 

#example for one contrast
#res <- my_daa_results(dds, test = "Wald", contrast = list(contrast_Condition[[1]]), physeq = physeq_filter)

This is the number of differentially abundant ASV (at genus) obtained for each tested contrast: pairwise contrast between the levels of factor of interest.

Code
# number of taxomic group daa for each tested contrast
daa_deseq2_Condition |> map_df(nrow) |>
  t() |> 
  kbl(col.names = c("Nb of diff. abundant ASV (at genus)")) |> kable_styling()  |>
  scroll_box(height = "600px")
Nb of diff. abundant ASV (at genus)
H0_Soil_12C_vs_Soil 326
H0_Soil_12C_heli_vs_Soil_heli 326
H0_Soil_heli_vs_Soil 328
H0_Soil_12C_heli_vs_Soil_12C 323
H5_Soil_12C_vs_Soil 335
H5_Soil_12C_heli_vs_Soil_heli 221
H5_Soil_heli_vs_Soil 339
H5_Soil_12C_heli_vs_Soil_12C 226
H9_Soil_12C_vs_Soil 335
H9_Soil_12C_heli_vs_Soil_heli 339
H9_Soil_heli_vs_Soil 334
H9_Soil_12C_heli_vs_Soil_12C 334
M0_Soil_12C_vs_Soil 318
M0_Soil_12C_heli_vs_Soil_heli 332
M0_Soil_heli_vs_Soil 321
M0_Soil_12C_heli_vs_Soil_12C 333
M5_Soil_12C_vs_Soil 343
M5_Soil_12C_heli_vs_Soil_heli 307
M5_Soil_heli_vs_Soil 340
M5_Soil_12C_heli_vs_Soil_12C 305
M9_Soil_12C_vs_Soil 337
M9_Soil_12C_heli_vs_Soil_heli 324
M9_Soil_heli_vs_Soil 329
M9_Soil_12C_heli_vs_Soil_12C 335
L0_Soil_12C_vs_Soil 314
L0_Soil_12C_heli_vs_Soil_heli 323
L0_Soil_heli_vs_Soil 307
L0_Soil_12C_heli_vs_Soil_12C 306
L5_Soil_12C_vs_Soil 317
L5_Soil_12C_heli_vs_Soil_heli 320
L5_Soil_heli_vs_Soil 323
L5_Soil_12C_heli_vs_Soil_12C 306
L9_Soil_12C_vs_Soil 319
L9_Soil_12C_heli_vs_Soil_heli 312
L9_Soil_heli_vs_Soil 307
L9_Soil_12C_heli_vs_Soil_12C 326

Heatmap based on Condition contrasts

Heatmap

After the differential abundance analysis, a hierarchical clustering followed by a heatmap plot [5] were performed on DA ASV in at least one of the comparisons, with ComplexHeatmap R package.

Abundances matrix was transformed using vst function from DESeq2 R package to stabilize the variance and take into account for sequencing bias. Rows (=ASV) were centered and divided by the standard deviation.

Hierarchical clustering was performed for both rows and columns in two steps: 1/ calculate the pearson distance (1 - corr) and 2/ apply a clustering method with the Ward’s distance.

Code
#vector containing DAA ASV in at least one of contrast
daa_asv <- daa_deseq2_Condition |>
  map_df(\(x) dplyr::select(x, ASV)) |> 
  t() |>
  as.character() |>
  unique() 
Code
# abundances were agglomerated by rank = Genus
# calculate a variance stabilizing transformation VST
# rows (= ASV) were centered and scaled 
vsd <- DESeq2::varianceStabilizingTransformation(dds, fitType="local", blind = FALSE)
SummarizedExperiment::assay(vsd) <- SummarizedExperiment::assay(vsd)|> 
  t() |>
  scale(center = TRUE, scale = TRUE) |>
  t()

# select daa ASV
vsd <- vsd[daa_asv,]
#colData(vsd)
#assay(vsd)
#assay(vsd)["Cluster_xxx",]
Code
# heatmap for DAA ASV (at least significant for one contrast)
nbclusters <- 7
# main title 
mytitle <- paste("Heatmap dist(1-pearson) + ward.D2 ", " n=", nrow(vsd), " ASV" ,sep="")

#color cells
cellcolor_funct = circlize::colorRamp2(seq(-4, 4, length = 3), c("blue", "#EEEEEE", "red"))

# faster hclust
ht_opt$fast_hclust = TRUE
ht_opt$message = FALSE

# column annotation
anno_df = data.frame(
    Condition = SummarizedExperiment::colData(vsd)$Condition,
    Incubation = SummarizedExperiment::colData(vsd)$Incubation,
    Deep = SummarizedExperiment::colData(vsd)$Deep
)
column_ha = HeatmapAnnotation(df = anno_df,
    col = list(
          Condition = c("Soil" = "#A0522D", "Soil_12C" = "#CDAA7D", "Soil_heli" = "#CD853F", "Soil_12C_heli" = "#EEAD0E"),
          Incubation = c("0" = "#9ecae1", "5" = "#4292c6", "9" = "#08519c"),
          Deep = c("L" = "#B3EE3A", "M" = "#9ACD32", "H" = "#458B00")
    )
)

# row annotation
colPhylum <- viridis(n=length(unique(SummarizedExperiment::rowData(vsd)$Phylum)))
names(colPhylum) <- unique(SummarizedExperiment::rowData(vsd)$Phylum) 

row_ha = HeatmapAnnotation(
  name = "Phylum",
  phylum = SummarizedExperiment::rowData(vsd)$Phylum,
  col = list(phylum = colPhylum),
  which = "row"
)
                              
ht <- ComplexHeatmap::Heatmap(SummarizedExperiment::assay(vsd), 
                              name="abundance transformed", 
                              #abundance transformed vst and scaled by row
                              #col = cellcolor_funct, 
                              clustering_distance_rows = "pearson",
                              clustering_method_rows = "ward.D2", 
                              #clustering_distance_columns = "euclidean",
                              cluster_columns = TRUE,
                              clustering_distance_columns = "pearson", #seems UniFrac MDS
                              #clustering_distance_columns = "binary",  #seems JACCARD MDS
                              clustering_method_columns = "ward.D2", 
                              row_split = nbclusters,
                              row_dend_reorder = TRUE, 
                              show_row_names = FALSE, 
                              column_dend_reorder = TRUE, 
                              column_title = mytitle, 
                              column_names_centered = TRUE,
                              column_names_gp = gpar(fontsize = 6), 
                              # column_split = mycolsplit,
                              top_annotation = column_ha,
                              right_annotation = row_ha
                              )
draw(ht)

Results

The clustering revealed 7 groups of DA ASV between Condition contrasts that share the same abundance profile. As explained by previous analyses, the profiles are mainly characterized by Incubation. The samples were not clustered by Condition. Then, within a group, profiles are interpreted on the basis of striking samples corresponding to combinations of interest in the 3 factors. Please note: the number of groups has been set arbitrarily on the basis of dendrogram visualization.

Code
tmp <- cbind(ASV = rownames(ht@matrix), SummarizedExperiment::rowData(vsd)[, 1:7], ht@matrix)
excelsheets <- list(
  cluster1 = tmp[row_order(ht)[[1]], column_order(ht)],
  cluster2 = tmp[row_order(ht)[[2]], column_order(ht)],
  cluster3 = tmp[row_order(ht)[[3]], column_order(ht)],
  cluster4 = tmp[row_order(ht)[[4]], column_order(ht)],
  cluster5 = tmp[row_order(ht)[[5]], column_order(ht)],
  cluster6 = tmp[row_order(ht)[[6]], column_order(ht)],
  cluster7 = tmp[row_order(ht)[[7]], column_order(ht)]
)
write.xlsx(excelsheets, file = paste0("./html/", "vsd-matrix-Condition-bycluster", ".xlsx"))

Differential abundance analysis between incubation time

A new list of contrasts was established to identify differentially abundant ASV between the time of incubation at Condition and Deep fixed.

Code
# define interesting contrasts
# at fixed Condition and Deep between Incubation

contrast_Incubation <- list(
"H_Soil_9vs5"= c("GroupH_9_Soil", "GroupH_5_Soil"),
"H_Soil_5vs0"= c("GroupH_5_Soil", "GroupH_0_Soil"),
"H_Soil_9vs0"= c("GroupH_9_Soil", "GroupH_0_Soil"),
"H_Soil_12C_9vs5"= c("GroupH_9_Soil_12C", "GroupH_5_Soil_12C"),
"H_Soil_12C_5vs0"= c("GroupH_5_Soil_12C", "GroupH_0_Soil_12C"),
"H_Soil_12C_9vs0"= c("GroupH_9_Soil_12C", "GroupH_0_Soil_12C"),
"H_Soil_heli_9vs5"= c("GroupH_9_Soil_heli", "GroupH_5_Soil_heli"),
"H_Soil_heli_5vs0"= c("GroupH_5_Soil_heli", "GroupH_0_Soil_heli"),
"H_Soil_heli_9vs0"= c("GroupH_9_Soil_heli", "GroupH_0_Soil_heli"),
"H_Soil_12C_heli_9vs5"= c("GroupH_9_Soil_12C_heli", "GroupH_5_Soil_12C_heli"),
"H_Soil_12C_heli_5vs0"= c("GroupH_5_Soil_12C_heli", "GroupH_0_Soil_12C_heli"),
"H_Soil_12C_heli_9vs0"= c("GroupH_9_Soil_12C_heli", "GroupH_0_Soil_12C_heli"),

"M_Soil_9vs5"= c("GroupM_9_Soil", "GroupM_5_Soil"),
"M_Soil_5vs0"= c("GroupM_5_Soil", "GroupM_0_Soil"),
"M_Soil_9vs0"= c("GroupM_9_Soil", "GroupM_0_Soil"),
"M_Soil_12C_9vs5"= c("GroupM_9_Soil_12C", "GroupM_5_Soil_12C"),
"M_Soil_12C_5vs0"= c("GroupM_5_Soil_12C", "GroupM_0_Soil_12C"),
"M_Soil_12C_9vs0"= c("GroupM_9_Soil_12C", "GroupM_0_Soil_12C"),
"M_Soil_heli_9vs5"= c("GroupM_9_Soil_heli", "GroupM_5_Soil_heli"),
"M_Soil_heli_5vs0"= c("GroupM_5_Soil_heli", "GroupM_0_Soil_heli"),
"M_Soil_heli_9vs0"= c("GroupM_9_Soil_heli", "GroupM_0_Soil_heli"),
"M_Soil_12C_heli_9vs5"= c("GroupM_9_Soil_12C_heli", "GroupM_5_Soil_12C_heli"),
"M_Soil_12C_heli_5vs0"= c("GroupM_5_Soil_12C_heli", "GroupM_0_Soil_12C_heli"),
"M_Soil_12C_heli_9vs0"= c("GroupM_9_Soil_12C_heli", "GroupM_0_Soil_12C_heli"),

"L_Soil_9vs5"= c("GroupL_9_Soil", "GroupL_5_Soil"),
"L_Soil_5vs0"= c("GroupL_5_Soil", "GroupL_0_Soil"),
"L_Soil_9vs0"= c("GroupL_9_Soil", "GroupL_0_Soil"),
"L_Soil_12C_9vs5"= c("GroupL_9_Soil_12C", "GroupL_5_Soil_12C"),
"L_Soil_12C_5vs0"= c("GroupL_5_Soil_12C", "GroupL_0_Soil_12C"),
"L_Soil_12C_9vs0"= c("GroupL_9_Soil_12C", "GroupL_0_Soil_12C"),
"L_Soil_heli_9vs5"= c("GroupL_9_Soil_heli", "GroupL_5_Soil_heli"),
"L_Soil_heli_5vs0"= c("GroupL_5_Soil_heli", "GroupL_0_Soil_heli"),
"L_Soil_heli_9vs0"= c("GroupL_9_Soil_heli", "GroupL_0_Soil_heli"),
"L_Soil_12C_heli_9vs5"= c("GroupL_9_Soil_12C_heli", "GroupL_5_Soil_12C_heli"),
"L_Soil_12C_heli_5vs0"= c("GroupL_5_Soil_12C_heli", "GroupL_0_Soil_12C_heli"),
"L_Soil_12C_heli_9vs0"= c("GroupL_9_Soil_12C_heli", "GroupL_0_Soil_12C_heli")
)

daa_deseq2_Incubation <- contrast_Incubation |>
  map(\(x) my_daa_results(dds, test = "Wald", physeq = physeq_filter, contrast = list(x))) 

This is the number of differentially abundant ASV (at genus) obtained for each tested contrast: pairwise contrast between the levels of factor of interest.

Code
# number of taxomic group daa for each tested contrast
daa_deseq2_Incubation|> map_df(nrow) |>
  t() |> 
  kbl(col.names = c("Nb of diff. abundant ASV (at genus)")) |> kable_styling()  |>
  scroll_box(height = "600px")
Nb of diff. abundant ASV (at genus)
H_Soil_9vs5 330
H_Soil_5vs0 333
H_Soil_9vs0 335
H_Soil_12C_9vs5 335
H_Soil_12C_5vs0 329
H_Soil_12C_9vs0 329
H_Soil_heli_9vs5 340
H_Soil_heli_5vs0 339
H_Soil_heli_9vs0 331
H_Soil_12C_heli_9vs5 230
H_Soil_12C_heli_5vs0 230
H_Soil_12C_heli_9vs0 328
M_Soil_9vs5 339
M_Soil_5vs0 325
M_Soil_9vs0 321
M_Soil_12C_9vs5 340
M_Soil_12C_5vs0 339
M_Soil_12C_9vs0 337
M_Soil_heli_9vs5 339
M_Soil_heli_5vs0 339
M_Soil_heli_9vs0 334
M_Soil_12C_heli_9vs5 307
M_Soil_12C_heli_5vs0 308
M_Soil_12C_heli_9vs0 327
L_Soil_9vs5 307
L_Soil_5vs0 307
L_Soil_9vs0 305
L_Soil_12C_9vs5 317
L_Soil_12C_5vs0 309
L_Soil_12C_9vs0 311
L_Soil_heli_9vs5 325
L_Soil_heli_5vs0 332
L_Soil_heli_9vs0 319
L_Soil_12C_heli_9vs5 305
L_Soil_12C_heli_5vs0 314
L_Soil_12C_heli_9vs0 317

Heatmap based on Incubation contrasts

Heatmap

After the differential abundance analysis, a hierarchical clustering followed by a heatmap plot [5] were performed on DA ASV in at least one of the comparisons, with ComplexHeatmap R package.

Abundances matrix was transformed using vst function from DESeq2 R package to stabilize the variance and take into account for sequencing bias. Rows (=ASV) were centered and divided by the standard deviation.

Hierarchical clustering was performed for both rows and columns in two steps: 1/ calculate the pearson distance (1 - corr) and 2/ apply a clustering method with the Ward’s distance.

Code
#vector containing DAA ASV in at least one of contrast
daa_asv <- daa_deseq2_Incubation |>
  map_df(\(x) dplyr::select(x, ASV)) |> 
  t() |>
  as.character() |>
  unique() 
Code
# abundances were agglomerated by rank = Genus
# calculate a variance stabilizing transformation VST
# rows (= ASV) were centered and scaled 
vsd <- DESeq2::varianceStabilizingTransformation(dds, fitType="local", blind = FALSE)
SummarizedExperiment::assay(vsd) <- SummarizedExperiment::assay(vsd)|> 
  t() |>
  scale(center = TRUE, scale = TRUE) |>
  t()

# select daa ASV
vsd <- vsd[daa_asv,]
#colData(vsd)
#assay(vsd)
#assay(vsd)["Cluster_xxx",]
Code
# heatmap for DAA ASV (at least significant for one contrast)
nbclusters <- 5
# main title 
mytitle <- paste("Heatmap dist(1-pearson) + ward.D2 ", " n=", nrow(vsd), " ASV" ,sep="")

#color cells
cellcolor_funct = circlize::colorRamp2(seq(-4, 4, length = 3), c("blue", "#EEEEEE", "red"))

# faster hclust
ht_opt$fast_hclust = TRUE
ht_opt$message = FALSE

# column annotation
anno_df = data.frame(
    Condition = SummarizedExperiment::colData(vsd)$Condition,
    Incubation = SummarizedExperiment::colData(vsd)$Incubation,
    Deep = SummarizedExperiment::colData(vsd)$Deep
)
column_ha = HeatmapAnnotation(df = anno_df,
    col = list(
          Condition = c("Soil" = "#A0522D", "Soil_12C" = "#CDAA7D", "Soil_heli" = "#CD853F", "Soil_12C_heli" = "#EEAD0E"),
          Incubation = c("0" = "#9ecae1", "5" = "#4292c6", "9" = "#08519c"),
          Deep = c("L" = "#B3EE3A", "M" = "#9ACD32", "H" = "#458B00")
    )
)

# row annotation
colPhylum <- viridis(n=length(unique(SummarizedExperiment::rowData(vsd)$Phylum)))
names(colPhylum) <- unique(SummarizedExperiment::rowData(vsd)$Phylum) 

row_ha = HeatmapAnnotation(
  name = "Phylum",
  phylum = SummarizedExperiment::rowData(vsd)$Phylum,
  col = list(phylum = colPhylum),
  which = "row"
)
                              
ht <- ComplexHeatmap::Heatmap(SummarizedExperiment::assay(vsd), 
                              name="abundance transformed", 
                              #abundance transformed vst and scaled by row
                              #col = cellcolor_funct, 
                              clustering_distance_rows = "pearson",
                              clustering_method_rows = "ward.D2", 
                              #clustering_distance_columns = "euclidean",
                              cluster_columns = TRUE,
                              clustering_distance_columns = "pearson", #seems UniFrac MDS
                              #clustering_distance_columns = "binary",  #seems JACCARD MDS
                              clustering_method_columns = "ward.D2", 
                              row_split = nbclusters,
                              row_dend_reorder = TRUE, 
                              show_row_names = FALSE, 
                              column_dend_reorder = TRUE, 
                              column_title = mytitle, 
                              column_names_centered = TRUE,
                              column_names_gp = gpar(fontsize = 6), 
                              # column_split = mycolsplit,
                              top_annotation = column_ha,
                              right_annotation = row_ha
                              )
draw(ht)

Results

The clustering revealed 5 groups of DA ASV between Incubation contrasts that share the same abundance profile. As explained by previous analyses, the profiles are mainly characterized by Incubation. The samples were not clustered by Condition. Then, within a group, profiles are interpreted on the basis of striking samples corresponding to combinations of interest in the 3 factors. Please note: the number of groups has been set arbitrarily on the basis of dendrogram visualization.

Code
tmp <- cbind(ASV = rownames(ht@matrix), SummarizedExperiment::rowData(vsd)[, 1:7], ht@matrix)
excelsheets <- list(
  cluster1 = tmp[row_order(ht)[[1]], column_order(ht)],
  cluster2 = tmp[row_order(ht)[[2]], column_order(ht)],
  cluster3 = tmp[row_order(ht)[[3]], column_order(ht)],
  cluster4 = tmp[row_order(ht)[[4]], column_order(ht)],
  cluster5 = tmp[row_order(ht)[[5]], column_order(ht)]
)
write.xlsx(excelsheets, file = paste0("./html/", "vsd-matrix-Incubation-bycluster", ".xlsx"))

Heatmap on species containing PAH-ring hydroxylating dioxygenases

Abundance’s counts were agglomerated at specie taxonomic rank. Relative abundance was calculated by dividing by the sum of the counts.

Code
#list taxa of interest: containing PAH-ring hydroxylating dioxygenases
listPAH <- read.xlsx("./data/list_Species_PAH-ring.xlsx")
Code
# agglomarate at fixed taxonomic rank
physeq_PAH_sp <- fast_tax_glom(physeq, taxrank="Species")
#relative abundance
physeq_PAH_sp <- transform_sample_counts(physeq_PAH_sp, function(x) x/sum(x)*100)

Few of species contain PAH-ring hydroxylating dioxygenases were identified in the soil samples. In fact, the taxonomic assignation is not enough sufficient, with some unknown species” and “Multi-affiliation” at the specie taxonomic rank. List of species included in the list of bacteria containing PAH-ring hydroxylating dioxygenases.

Code
# species in common between soil of samples and the list of bacteria containing PAH-ring hydroxylating dioxygenases
ind <- tax_table(physeq_PAH_sp)[, "Species"] %in% listPAH$Specie
tax_table(physeq_PAH_sp)[ind, "Species"]
Taxonomy Table:     [4 taxa by 1 taxonomic ranks]:
             Species            
Cluster_2482 "Mycobacterium sp."
Cluster_22   "Rhodococcus sp."  
Cluster_145  "Terrabacter sp."  
Cluster_350  "Sphingomonas sp." 
Code
#subset taxa in PAH list: species within Genus of interest
physeq_PAH_sp <- subset_taxa(physeq_PAH_sp, Genus %in% listPAH$Genus)

So, we have considered genera provided from the list of bacteria that contain PAH-ring hydroxylating dioxygenases (file list_Species_PAH-ring.xlsx) to produce the heatmap plot.

The rows of heatmap plot correspond to the species identified in the soil samples and whose genus is included in the list of bacteria containing PAH-ring hydroxylating dioxygenases. This is the heatmap plot of the relative abundance of these species.

Code
my_otutable <- otu_table(physeq_PAH_sp) 
#cell color
cellcolor_funct <- colorRampPalette(c("lightyellow", "red"), space = "rgb")(50)

# row annotation
colGenus <- viridis(n=length(unique(tax_table(physeq_PAH_sp)[, "Genus"])))
names(colGenus) <- unique(tax_table(physeq_PAH_sp)[, "Genus"]) 

row_ha_PAH = HeatmapAnnotation(
  name = "Genus",
  Genus = tax_table(physeq_PAH_sp)[, "Genus"],
  col = list(Genus = colGenus),
  which = "row"
)

mytitle_PAH <- "Heatmap plot: relative abundance of species whose genus is identified PAH"

ht_PAH <- ComplexHeatmap::Heatmap(my_otutable, 
                              name="relative abundance", 
                              #relative abundance
                              col = cellcolor_funct, 
                              cluster_rows = FALSE,
                              cluster_columns = FALSE,
                              show_row_names = TRUE, 
                              column_title = mytitle_PAH, 
                              column_names_centered = TRUE,
                              column_names_gp = gpar(fontsize = 6), 
                              # column_split = mycolsplit,
                               top_annotation = column_ha,
                               right_annotation = row_ha_PAH
                              )
draw(ht_PAH)

Code
tmp_PAH <- cbind(ASV = rownames(ht_PAH@matrix), tax_table(physeq_PAH_sp), ht_PAH@matrix)

excelsheets <- list(rel_abundance = tmp_PAH)
write.xlsx(excelsheets, file = paste0("./html/", "relative_abundance_PAH", ".xlsx"))

Downloads

  Description

initial phyloseq object

Excel file containing the results of alpha-diversity tests

Excel file containing the results of differential abundance analysis concerning Condition contrasts

Excel file containing the profil of abundance by cluster (vsd matrix trannsformed) of ASV differentially abundant between Condition contrasts

Excel file containing the results of differential abundance analysis concerning Incubation contrasts

Excel file containing the profil of abundance by cluster (vsd matrix trannsformed) of ASV differentially abundant between Incubation contrasts

Excel file containing the relative abundance of species whose genus is included in the list of bacteria containing PAH-ring hydroxylating dioxygenases

Reproducibility token

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.2 (2024-10-31)
 os       Ubuntu 24.04.1 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  fr_FR.UTF-8
 ctype    fr_FR.UTF-8
 tz       Europe/Paris
 date     2025-02-18
 pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package           * version date (UTC) lib source
 ComplexHeatmap    * 2.20.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 cowplot           * 1.1.3   2024-01-22 [1] CRAN (R 4.4.0)
 data.table        * 1.16.4  2024-12-06 [1] CRAN (R 4.4.2)
 dplyr             * 1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
 DT                * 0.33    2024-04-04 [1] CRAN (R 4.4.0)
 emmeans           * 1.10.7  2025-01-31 [1] CRAN (R 4.4.2)
 forcats           * 1.0.0   2023-01-29 [1] CRAN (R 4.4.0)
 ggplot2           * 3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
 InraeThemes       * 3.0.0   2024-05-21 [1] Github (davidcarayon/InraeThemes@9ee5259)
 kableExtra        * 1.4.0   2024-01-24 [1] CRAN (R 4.4.0)
 lubridate         * 1.9.4   2024-12-08 [1] CRAN (R 4.4.2)
 openxlsx          * 4.2.8   2025-01-25 [1] CRAN (R 4.4.2)
 phyloseq          * 1.48.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 phyloseq.extended * 0.1.4.1 2024-05-21 [1] Github (mahendra-mariadassou/phyloseq-extended@150a871)
 purrr             * 1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
 readr             * 2.1.5   2024-01-10 [1] CRAN (R 4.4.0)
 stringr           * 1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
 tibble            * 3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
 tidyr             * 1.3.1   2024-01-24 [1] CRAN (R 4.4.0)
 tidyverse         * 2.0.0   2023-02-22 [1] CRAN (R 4.4.0)
 viridis           * 0.6.5   2024-01-29 [1] CRAN (R 4.4.0)
 viridisLite       * 0.4.2   2023-05-02 [1] CRAN (R 4.4.0)

 [1] /home/orue/R/x86_64-pc-linux-gnu-library/4.4
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────

References

1. McMurdie PJ, Holmes S. Phyloseq: An r package for reproducible interactive analysis and graphics of microbiome census data. PloS one. 2013;8:e61217.
2. Mariadassou M. Phyloseq-extended: Various customs functions written to enhance the base functions of phyloseq. 2018. https://github.com/mahendra-mariadassou/phyloseq-extended.
3. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15:550.
4. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological). 1995;57:289–300. http://www.jstor.org/stable/2346101.
5. Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–9. doi:10.1093/bioinformatics/btw313.

Reuse

This document will not be accessible without prior agreement of the partners

A work by Migale Bioinformatics Facility
Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France
Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, 78350, Jouy-en-Josas, France