TEMPO

Application of stable isotope probing to identify temperature responsive bacterial taxa at the level of 3-hydroxy fatty acid synthesis: Time T3 and T7

in progress

collaboration

Authors

Affiliation

Olivier Rué

Migale bioinformatics facility

Christelle Hennequet-Antier

Migale bioinformatics facility

Published

November 22, 2023

Modified

February 19, 2025

Note

This document is a report of the analyses performed. You will find all the code used to analyze these data. The version of the tools (maybe in code chunks) and their references are indicated, for questions of reproducibility.

Aim of the project

The aim of the project is to provide a better understanding of how soil bacteria respond to temperature variations at the membrane level. Bacterial lipids can be used in paleoclimatology to reconstruct temperature of past climates in marine and terrestrial archives. This involves the use of proxies and calculations that are based on the relative amounts of these lipids. In soil, only one organic proxy is available, mainly due to the difficulty to work on soil, which is much more heterogeneous in structure than aquatic environments. A drawback of this sungle proxy is the error which is associated with it as well as the lack of confirmation of the result by a different approach. A new proxy based on 3-hydroxy fatty acids has been proposed in 2016. To better understand and validate this proxy, as well as to investigate new more precise proxies, we aim at understanding what soil bacterial species contribute to the proposed proxy by investigating their response te temperature at the taxonomic and lipid levels. We have used a DNA- and lipid-SIP approach during a time-course experiment and are now at the stage of analyzing what species have used the 13C labelled substrate in a temperature dependent manner. The profile of the enriched taxa in the 13C fraction will be compared in the different samples (different temperature, different incubation time) and will be compared with the 13C-enriched lipid profiles in the same samples.

Partners

Olivier Rué - Migale bioinformatics facility - BioInfomics - INRAE
Christelle Hennequet-Antier - Migale bioinformatics facility - BioInfomics - INRAE
Sylvie Collin - Sorbonne Université

Deliverables

Deliverables agreed at the preliminary meeting (Table 1).

Table 1: Deliverables

	Definition
1	HTML report

Data management

Important

All data is managed by the migale facility for the duration of the project. Once the project is over, the Migale facility does not keep your data. We will provide you with the raw data and associated metadata that will be deposited on public repositories before the results are used. We can guide you in the submission process. We will then decide which files to keep, knowing that this report will also be provided to you and that the analyses can be replayed if needed.

Biostatistics

The phyloseq package [1] is a tool to import, store, analyze, and graphically display complex metabarcoding data, especially when there is associated sample data, phylogenetic tree, and/or taxonomic assignment of the OTUs or ASVs. Various customs functions written to enhance the base functions of phyloseq are available in the phyloseq-extended package [2].

The available metadata are:

Show the code

metadata <- read_tsv("../data/metadata.tsv", 
                     col_types = cols(SampleName = col_character(), 
                                      Time = col_factor(levels = c("T0", "T3", "T7", "T14", "T28"), ordered = TRUE), 
                                      Temperature = col_factor(levels = c("5", "15", "25"), ordered = TRUE),
                                      Replicate = col_factor(levels = c("1", "2", "3"), ordered = TRUE)
                     )
) %>%
column_to_rownames(var="SampleName")
 
metadata <- metadata %>% 
  mutate(Temperature_NotOrd = factor(Temperature, levels = c("5", "15", "25"), ordered = FALSE))  %>%
  mutate(
    Time_Gluc = case_when(
      Time == "T3" ~ paste(Time, Glucose, sep="_"),
      Time == "T7" ~ paste(Time, Glucose, sep="_"),
      .default = Time
    )) %>%
  mutate(Time_Gluc = factor(Time_Gluc,
                            levels = c("T0", "T3_12C", "T3_13C", "T7_12C", "T7_13C", "T14", "T28"))) %>%
  mutate(Group = paste(Time_Gluc, Temperature, sep="_")) %>%
  mutate(Group = factor(Group, 
    levels = c("T0_5", "T0_15", "T0_25", "T3_12C_5", "T3_13C_5", "T3_12C_15", "T3_13C_15", "T3_12C_25", "T3_13C_25", "T7_12C_5", "T7_13C_5", "T7_12C_15", "T7_13C_15", "T7_12C_25", "T7_13C_25", "T14_5", "T14_15", "T14_25", "T28_5", "T28_15", "T28_25"))
  ) %>% 
  mutate(
    SampleLabel = case_when(
      is.na(FractionOnCsClGradient) == TRUE ~ paste(Group, Replicate, sep="_"),
    .default = paste(Group, FractionOnCsClGradient, Replicate, sep="_"))
)

#spec(metadata)
# metadata %>% 
#   datatable()

Show the code

biomfile <- paste0("../html/","affiliation.biom")
physeq <- import_frogs(biomfile, taxMethod = "blast")
sample_data(physeq) <- metadata
phy_tree(physeq) <- phyloseq::read_tree("../html/tree.nwk")
physeq <- physeq %>% microViz::ps_arrange(Time_Gluc, Temperature, FractionOnCsClGradient)
# visualisation and save
metadata %>% 
  datatable()

Show the code

microbiome::summarize_phyloseq(physeq)

[[1]]
[1] "1] Min. number of reads = 10643"

[[2]]
[1] "2] Max. number of reads = 48037"

[[3]]
[1] "3] Total number of reads = 4010689"

[[4]]
[1] "4] Average number of reads = 29708.8074074074"

[[5]]
[1] "5] Median number of reads = 31149"

[[6]]
[1] "7] Sparsity = 0.218730882381593"

[[7]]
[1] "6] Any OTU sum to 1 or less? NO"

[[8]]
[1] "8] Number of singletons = 0"

[[9]]
[1] "9] Percent of OTUs that are singletons \n        (i.e. exactly one read detected across all samples)0"

[[10]]
[1] "10] Number of sample variables are: 10"

[[11]]
 [1] "Time"                   "Temperature"            "SampleID"              
 [4] "Replicate"              "FractionOnCsClGradient" "Glucose"               
 [7] "Temperature_NotOrd"     "Time_Gluc"              "Group"                 
[10] "SampleLabel"

Show the code

saveRDS(object = physeq, file="physeq.rds")

Show the code

# count number of samples per condition
sample_data(physeq) %>% 
  as_tibble() %>% 
  dplyr::count(Time)

# A tibble: 5 × 2
  Time      n
  <ord> <int>
1 T0        9
2 T3       54
3 T7       54
4 T14       9
5 T28       9

Show the code

sample_data(physeq) %>% 
  as_tibble() %>% 
  dplyr::add_count(Time, Temperature, Glucose) %>%
  dplyr::select(Time, Temperature, , Glucose, Group, n) %>%
  distinct() %>%
  ggplot(., aes(x = Group, y = n, fill = Time)) + 
  geom_bar(stat = "identity") +
  labs(title="Number of replicates", y = "n") +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(angle = 90, size = 8, hjust = 1))

Show the code

# frequencies
#microbiome::plot_frequencies(sample_data(physeq), "Time", "Group")

Note

For analysis purposes, the variable Time_Gluc is created as the combination of the Time and Glucose variables.

Select samples from Time T3 and T7

Show the code

physeq <- physeq %>% microViz::ps_filter(Time %in% c("T3", "T7"))

Rarefaction

The common practice for comparing microbiome samples of different library sizes is to reduce the size of the larger samples using rarefaction until they contain the same number of reads as the smallest sample. The rarefaction curves show the species richness (i.e. number of species) against the library size and the vertical grey line indicates the smallest sample’s size. Each curve indicates when the sampling depth is sufficient to achieve an acceptable number of species discoveries and when increasing the depth leads to the discovery of rarer species. Rarefied data are used to estimate and compare alpha and beta diversity across samples.

Before rarefaction
After rarefaction

Show the code

phyloseq::plot_bar(physeq, x = "SampleLabel", fill = "Phylum") + 
  facet_grid(~ Time_Gluc + Temperature, scales = "free_x", space = "free") +
 # facet_grid(~Time + Temperature + Glucose, scales = "free_x", space = "free") +
  theme(axis.text.x = element_text(size = 6, hjust = 1)) +
  theme(legend.position="top")

Show the code

# sample_sums(physeq) %>% sort() %>% head(n=5)
# remove samples with less than nbreads_threshold 
nbreads_threshold  <- 11000
sample_sums(physeq)[!sample_sums(physeq) > nbreads_threshold]

MOU32_1_15_13C_F8_T7 
               10643

Show the code

physeq <- microViz::ps_filter(physeq, sample_sums(physeq) > nbreads_threshold)

Show the code

ggrare(physeq = physeq, step = 500, se = FALSE, color = "Group", plot = FALSE, label = "SampleLabel", verbose = FALSE) + 
  # scale_colour_manual(values = group_pal) +
  geom_vline(xintercept = c(min(sample_sums(physeq))), color = "gray60") + 
  ggtitle("Rarefaction curves")

Show the code

physeq_rare <- phyloseq::rarefy_even_depth(physeq, rngseed = 2002, replace = TRUE)

Show the code

phyloseq::plot_bar(physeq_rare, fill="Phylum") + 
  facet_grid(~ Time_Gluc + Temperature, scales = "free_x", space = "free") +
  theme(axis.text.x = element_text(size = 6, hjust = 1)) +
  theme(legend.position="top")

Results

One sample was removed due to a low number of reads, less than 11000.

After rarefaction, all depths are equal to 15765. The dataset contains 1686855 sequences allocated to 1901 taxa and 107 samples.

Taxonomic composition

Show the code

phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Kingdom", taxaSet1 = "Bacteria", taxaRank2 = "Phylum", numberOfTaxa = 10L) +
  facet_grid(~ Time_Gluc + Temperature, scales = "free_x", space = "free") +
  theme(axis.text.x = element_text(size = 6, hjust = 1)) +
  theme(legend.position="top")

Show the code

phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Kingdom", taxaSet1 = "Bacteria", taxaRank2 = "Family", numberOfTaxa = 10L) +
  facet_grid(~ Time_Gluc + Temperature, scales = "free_x", space = "free") +
  theme(axis.text.x = element_text(size = 6, hjust = 1)) +
  theme(legend.position="top")

Problematic taxa
                 taxa  Kingdom      Phylum  Class   Order  Family rank
Cluster_23 Cluster_23 Bacteria Chloroflexi KD4-96 Unknown Unknown    9

Results

Firmicutes, Actinobacteriota and Proteobacteria are dominants in major samples. Firmicutes appear mainly at Temperature > 15`. Bacteroidota, Methylomirabilota and Myxococcota are few abundants.

Focus on the genus rank across these major phyla.

Show the code

phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Phylum", taxaSet1 = "Actinobacteriota", taxaRank2 = "Genus") +
  facet_grid( ~ Time_Gluc + Temperature, scales = "free_x", space = "free_x") + 
  labs(title = "Relative composition", subtitle = "Actinobacteriota")

Problematic taxa
                 taxa  Kingdom           Phylum           Class
Cluster_5   Cluster_5 Bacteria Actinobacteriota  Actinobacteria
Cluster_44 Cluster_44 Bacteria Actinobacteriota Thermoleophilia
Cluster_57 Cluster_57 Bacteria Actinobacteriota Thermoleophilia
Cluster_26 Cluster_26 Bacteria Actinobacteriota       MB-A2-108
                         Order         Family             Genus rank
Cluster_5        Micrococcales Micrococcaceae Multi-affiliation    3
Cluster_44          Gaiellales        Unknown           Unknown    4
Cluster_57 Solirubrobacterales          67-14           Unknown    5
Cluster_26             Unknown        Unknown           Unknown    7

Show the code

phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Phylum", taxaSet1 = "Proteobacteria", taxaRank2 = "Genus") +
  facet_grid( ~ Time_Gluc + Temperature, scales = "free_x", space = "free_x") + 
  labs(title = "Relative composition", subtitle = "Proteobacteria")

Problematic taxa
                   taxa  Kingdom         Phylum               Class
Cluster_6     Cluster_6 Bacteria Proteobacteria Alphaproteobacteria
Cluster_12   Cluster_12 Bacteria Proteobacteria Alphaproteobacteria
Cluster_16   Cluster_16 Bacteria Proteobacteria Alphaproteobacteria
Cluster_27   Cluster_27 Bacteria Proteobacteria Alphaproteobacteria
Cluster_143 Cluster_143 Bacteria Proteobacteria Gammaproteobacteria
                      Order             Family             Genus rank
Cluster_6       Rhizobiales  Xanthobacteraceae           Unknown    2
Cluster_12      Rhizobiales Methyloligellaceae           Unknown    3
Cluster_16      Rhizobiales  Xanthobacteraceae Multi-affiliation    5
Cluster_27       Elsterales            Unknown           Unknown    7
Cluster_143 Burkholderiales            SC-I-84           Unknown    8

Show the code

phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Phylum", taxaSet1 = "Firmicutes", taxaRank2 = "Genus") +
  facet_grid( ~ Time_Gluc + Temperature, scales = "free_x", space = "free_x") + 
  labs(title = "Relative composition", subtitle = "Firmicutes")

Show the code

phyloseq.extended::plot_composition(physeq = physeq_rare, taxaRank1 = "Phylum", taxaSet1 = "Bacteroidota", taxaRank2 = "Genus") +
  facet_grid( ~ Time_Gluc + Temperature, scales = "free_x", space = "free_x") + 
  labs(title = "Relative composition", subtitle = "Bacteroidota")

Problematic taxa
                   taxa  Kingdom       Phylum       Class           Order
Cluster_209 Cluster_209 Bacteria Bacteroidota Bacteroidia Chitinophagales
Cluster_179 Cluster_179 Bacteria Bacteroidota Bacteroidia    Cytophagales
Cluster_655 Cluster_655 Bacteria Bacteroidota Bacteroidia Chitinophagales
                      Family   Genus rank
Cluster_209 Chitinophagaceae Unknown    2
Cluster_179  Microscillaceae Unknown    3
Cluster_655   Saprospiraceae Unknown    7

Alpha-diversity

Alpha-diversity represents diversity within a sample (i.e. ecosystem). The most commonly used measures of diversity are : Observed, Chao1, Shannon, and Simpson. Richness (i.e. Observed) is simply the number of species present in the sample and Shannon is another measure based on species richness. Chao1 is an estimator of the number of species applied to the presence/absence data for each species. The Simpson index gives more importance to abundant species.

Show the code

# plot alpha diversity
phyloseq::plot_richness(physeq = physeq_rare, x = "Group", color= "Group", title = "Alpha diversity graphics", measures = c("Observed", "Chao1", "Shannon", "Simpson", "InvSimpson"), nrow=5) + 
  geom_jitter() +
  scale_fill_brewer(palette = "Paired") +
  theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
  theme(legend.position="top")

Show the code

# plot alpha diversity 
p.alphadiversity <- microbiome::boxplot_alpha(physeq_rare, 
                           index = c("Observed"),
                           x_var = "Group",
                           )
p.alphadiversity +
    facet_grid(cols = vars(Time_Gluc, Temperature), scales = "free_x", space = "free_x") + 
    theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
    theme(legend.position="top")

Show the code

# plot alpha diversity 
p.alphadiversity <- microbiome::boxplot_alpha(physeq_rare, 
                           index = c("Chao1"),
                           x_var = "Group",
                           )
p.alphadiversity +
    facet_grid(cols = vars(Time_Gluc, Temperature), scales = "free_x", space = "free_x") + 
    theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
    theme(legend.position="top")

Show the code

# plot alpha diversity 
p.alphadiversity <- microbiome::boxplot_alpha(physeq_rare, 
                           index = c("Shannon"),
                           x_var = "Group",
                           )
p.alphadiversity +
    facet_grid(cols = vars(Time_Gluc, Temperature), scales = "free_x", space = "free_x") + 
    theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
    theme(legend.position="top")

Show the code

# plot alpha diversity 
p.alphadiversity <- microbiome::boxplot_alpha(physeq_rare, 
                           index = c("Simpson"),
                           x_var = "Group",
                           )
p.alphadiversity +
    facet_grid(cols = vars(Time_Gluc, Temperature), scales = "free_x", space = "free_x") + 
    theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1)) +
    theme(legend.position="top")

Results

The alpha diversities decrease with Temperature. At Temperature 25, the observed diversity is different between 12C and 13C. Statistical tests are needed to explore the effect of Temperature and Time_Gluc.

Alpha-diversity - statistical tests

We calculated the alpha-diversity indices and combined them with covariates from experimental design. We tested the influence Time_Gluc and Temperature on Observed alpha-diversity.

Show the code

div_data <- cbind(estimate_richness(physeq_rare),  ## diversity indices
                  sample_data(physeq_rare)         ## covariates
                  )

We tested the influence Time_Gluc and Temperature on Observed alpha-diversity.

Show the code

alphadiv.lm <- lm(Observed ~ Time_Gluc*Temperature, data = div_data)
anova(alphadiv.lm)

Analysis of Variance Table

Response: Observed
                      Df  Sum Sq Mean Sq F value    Pr(>F)    
Time_Gluc              3  924825  308275 12.9052 3.836e-07 ***
Temperature            2 3295336 1647668 68.9759 < 2.2e-16 ***
Time_Gluc:Temperature  6  399862   66644  2.7899   0.01525 *  
Residuals             95 2269322   23888                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

# interaction visualisation
emmip(alphadiv.lm, Time_Gluc ~ Temperature, CIs = TRUE)

Results

Richness (Observed diversity) is significantly different between Time_Gluc and Temperature and the interaction.

Show the code

#tests
EMM <- emmeans(alphadiv.lm, ~ Time_Gluc*Temperature)
# tests correction are performed within the variable
# P value adjustment: tukey method for comparing
comp1 <- pairs(EMM, simple = "Time_Gluc")
# P value adjustment: tukey method for comparing
comp2 <- pairs(EMM, simple = "Temperature")

Show the code

# P value adjustment: tukey method for comparing
res_emm <- contrast(EMM, method = "revpairwise", by = NULL,
    enhance.levels = c("Time_Gluc", "Temperature"), adjust = "tukey")

Results

Richness (Observed diversity) is significantly different between Time_Gluc for some pairwise comparisons at Temperature 15 and 25 at significance level 0.05. At Temperature = 25, the difference is significative between T3_12C and T3_13C, and also between T7_12C and T7_13C. Richness is significantly different between Temperature for each Time_Gluc level except between Temperature = 5 - Temperature = 15 at Time_Gluc = T7_13C. See the results in the file file.

	Description
	Excel file containing the results of alpha diversity tests for rarefied samples at T3 and T7.

Linear mixed model with interaction

We tested the influence Time_Gluc and Temperature on Observed alpha-diversity. We performed a linear mixed model with Time_Gluc, Temperature and the interaction Time_Gluc x Temperature as fixed effects and FractionOnCsClGradientas a random effect. The random error term depends on each FractionOnCsClGradient for the intercept .

Show the code

# model with only fixed effects 
# look at the most important factors
# alphadiv.aov <- aov(Observed ~ 0 + Time_Gluc*Temperature_NotOrd, data = div_data)
# summary(alphadiv.aov)

# add a random effect
# Note that the ORDER of effects is important
# alphadiv.lmer2 <- lmer(Observed ~ 0 + Time_Gluc*Temperature_NotOrd + (1 + Time_Gluc + Temperature_NotOrd | FractionOnCsClGradient), data = div_data)
# summary(alphadiv.lmer2)
# anova(alphadiv.lmer2)
# ranova(alphadiv.lmer2)
alphadiv.lmer1 <- lmer(Observed ~ 0 + Time_Gluc*Temperature_NotOrd + (1 | FractionOnCsClGradient), data = div_data)
summary(alphadiv.lmer1)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: 
Observed ~ 0 + Time_Gluc * Temperature_NotOrd + (1 | FractionOnCsClGradient)
   Data: div_data

REML criterion at convergence: 1205.4

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.70976 -0.60584  0.04683  0.62159  1.82992 

Random effects:
 Groups                 Name        Variance Std.Dev.
 FractionOnCsClGradient (Intercept) 14118    118.8   
 Residual                           13321    115.4   
Number of obs: 107, groups:  FractionOnCsClGradient, 3

Fixed effects:
                                     Estimate Std. Error       df t value
Time_GlucT3_12C                      1451.222     78.652    3.274  18.451
Time_GlucT3_13C                      1433.444     78.652    3.274  18.225
Time_GlucT7_12C                      1427.333     78.652    3.274  18.147
Time_GlucT7_13C                      1304.222     78.652    3.274  16.582
Temperature_NotOrd15                  -10.778     54.408   93.000  -0.198
Temperature_NotOrd25                 -323.222     54.408   93.000  -5.941
Time_GlucT3_13C:Temperature_NotOrd15  -50.111     76.945   93.000  -0.651
Time_GlucT7_12C:Temperature_NotOrd15 -120.000     76.945   93.000  -1.560
Time_GlucT7_13C:Temperature_NotOrd15 -164.190     78.163   93.003  -2.101
Time_GlucT3_13C:Temperature_NotOrd25 -249.222     76.945   93.000  -3.239
Time_GlucT7_12C:Temperature_NotOrd25   33.778     76.945   93.000   0.439
Time_GlucT7_13C:Temperature_NotOrd25 -129.667     76.945   93.000  -1.685
                                     Pr(>|t|)    
Time_GlucT3_12C                      0.000202 ***
Time_GlucT3_13C                      0.000210 ***
Time_GlucT7_12C                      0.000213 ***
Time_GlucT7_13C                      0.000285 ***
Temperature_NotOrd15                 0.843407    
Temperature_NotOrd25                 4.88e-08 ***
Time_GlucT3_13C:Temperature_NotOrd15 0.516486    
Time_GlucT7_12C:Temperature_NotOrd15 0.122262    
Time_GlucT7_13C:Temperature_NotOrd15 0.038381 *  
Time_GlucT3_13C:Temperature_NotOrd25 0.001664 ** 
Time_GlucT7_12C:Temperature_NotOrd25 0.661690    
Time_GlucT7_13C:Temperature_NotOrd25 0.095306 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
                T_GT3_12 Tm_GT3_13C Tm_GT7_12C Tm_GT7_13C T_NO15 T_NO25
Tm_GlT3_13C      0.761                                                 
Tm_GlT7_12C      0.761    0.761                                        
Tm_GlT7_13C      0.761    0.761      0.761                             
Tmprtr_NO15     -0.346    0.000      0.000      0.000                  
Tmprtr_NO25     -0.346    0.000      0.000      0.000      0.500       
T_GT3_13C:T_NO1  0.245   -0.245      0.000      0.000     -0.707 -0.354
T_GT7_12C:T_NO1  0.245    0.000     -0.245      0.000     -0.707 -0.354
T_GT7_13C:T_NO1  0.241    0.000      0.000     -0.241     -0.696 -0.348
T_GT3_13C:T_NO2  0.245   -0.245      0.000      0.000     -0.354 -0.707
T_GT7_12C:T_NO2  0.245    0.000     -0.245      0.000     -0.354 -0.707
T_GT7_13C:T_NO2  0.245    0.000      0.000     -0.245     -0.354 -0.707
                T_GT3_13C:T_NO1 T_GT7_12C:T_NO1 T_GT7_13C:T_NO1 T_GT3_13C:T_NO2
Tm_GlT3_13C                                                                    
Tm_GlT7_12C                                                                    
Tm_GlT7_13C                                                                    
Tmprtr_NO15                                                                    
Tmprtr_NO25                                                                    
T_GT3_13C:T_NO1                                                                
T_GT7_12C:T_NO1  0.500                                                         
T_GT7_13C:T_NO1  0.492           0.492                                         
T_GT3_13C:T_NO2  0.500           0.250           0.246                         
T_GT7_12C:T_NO2  0.250           0.500           0.246           0.500         
T_GT7_13C:T_NO2  0.250           0.250           0.492           0.500         
                T_GT7_12C:T_NO2
Tm_GlT3_13C                    
Tm_GlT7_12C                    
Tm_GlT7_13C                    
Tmprtr_NO15                    
Tmprtr_NO25                    
T_GT3_13C:T_NO1                
T_GT7_12C:T_NO1                
T_GT7_13C:T_NO1                
T_GT3_13C:T_NO2                
T_GT7_12C:T_NO2                
T_GT7_13C:T_NO2  0.500

Show the code

anova(alphadiv.lmer1)

Type III Analysis of Variance Table with Satterthwaite's method
                              Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
Time_Gluc                    6056031 1514008     4  2.826 113.654 0.0018165 ** 
Temperature_NotOrd           3304834 1652417     2 93.001 124.044 < 2.2e-16 ***
Time_Gluc:Temperature_NotOrd  390363   65061     6 93.001   4.884 0.0002222 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

ranova(alphadiv.lmer1)

ANOVA-like table for random-effects: Single term deletions

Model:
Observed ~ Time_Gluc + Temperature_NotOrd + (1 | FractionOnCsClGradient) + Time_Gluc:Temperature_NotOrd - 1
                             npar  logLik    AIC    LRT Df Pr(>Chisq)    
<none>                         14 -602.69 1233.4                         
(1 | FractionOnCsClGradient)   13 -626.78 1279.5 48.166  1  3.917e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show the code

plot(alphadiv.lmer1)

Show the code

# Test for normality
# par(mfrow = c(1, 2))
# qqnorm(ranef(alphadiv.lmer1)$FractionOnCsClGradient[,"(Intercept)"], 
#        main = "Random effects")
# qqnorm(resid(alphadiv.lmer1), main = "Residuals")


# interaction visualisation
emmip(alphadiv.lmer1, Time_Gluc ~ Temperature_NotOrd, CIs = TRUE)

Results

Richness (Observed diversity) is significantly different between Time_Gluc and Temperature and the interaction. The random effect FractionOnCsClGradient is significatively different between Intercept but do not depend on Time and Temperature (not shown).

Show the code

#tests
EMM <- emmeans(alphadiv.lmer1, ~ Time_Gluc*Temperature_NotOrd)
# tests correction are performed within the variable
# P value adjustment: tukey method for comparing
comp1 <- pairs(EMM, simple = "Time_Gluc")
# P value adjustment: tukey method for comparing
comp2 <- pairs(EMM, simple = "Temperature_NotOrd")

Results

Richness (Observed diversity) is significantly different between Time_Gluc for some pairwise comparisons at each Temperature at significance level 0.05. Richness is significantly different between Temperature for each Time_Gluc level except between Temperature = 5 - Temperature = 15 at Time = T3. See the results in the file file.

	Description
	Excel file containing the results of alpha diversity tests for rarefied samples at T3 and T7.

Beta-diversity

The dissimilarity between samples based on taxonomic composition is calculated with one of these Beta diversities: Bray-Curtis, Unweighted and weighted Unifrac, Jaccard index. The Jaccard index uses presence/absence information only whereas UniFrac integrates information from the phylogenetic tree.

Show the code

dist.jac <- phyloseq::distance(physeq_rare, method = "cc")
dist.bc <- phyloseq::distance(physeq_rare, method = "bray")
dist.uf <- phyloseq::distance(physeq_rare, method = "unifrac")
dist.wuf <- phyloseq::distance(physeq_rare, method = "wunifrac")
distance_list <- list(
  "Jaccard"     = dist.jac,
  "Bray-Curtis" = dist.bc,
  "UniFrac"     = dist.uf,
  "wUniFrac"    = dist.wuf
)

Show the code

# Mahendra's function
my_plot <- function(variable1, variable2, physeq = mydata_rare) {
  .plot <- function(distance) {
    .distance <- distance_list[[distance]] %>% as.matrix() %>% `[`(sample_names(physeq), sample_names(physeq)) %>% as.dist()
    p <- plot_ordination(physeq,
                    ordinate(physeq, method = "MDS", distance = .distance),
                    color = variable1, shape = variable2) + 
      scale_color_brewer(palette = "Paired") +
      theme(legend.position = "none") +
      ggtitle(distance)
    # print(p)
  }
  plots <- lapply(names(distance_list), .plot)
  legend <- cowplot::get_legend(plots[[1]] + theme(legend.position = "bottom"))
  pgrid <- cowplot::plot_grid(plotlist = plots)
  cowplot::plot_grid(pgrid, legend, ncol = 1, rel_heights = c(1, .1))
}

# Mahendra's function
# used own color's palette and shape
my_plot2 <- function(variable1, variable2, physeq = mydata_rare, shapeval, colorpal) {
  .plot <- function(distance) {
    .distance <- distance_list[[distance]] %>% as.matrix() %>% `[`(sample_names(physeq), sample_names(physeq)) %>% as.dist()
    p <- plot_ordination(physeq,
                    ordinate(physeq, method = "MDS", distance = .distance),
                    color = variable1, shape = variable2) + 
      scale_shape_manual(values = shapeval) +
      scale_color_manual(name = colorpal$name,
                         labels = colorpal$labels,
                         values = colorpal$values
                     ) +
      theme(legend.position = "none") +
      ggtitle(distance)
    # print(p)
  }
  plots <- lapply(names(distance_list), .plot)
  legend <- cowplot::get_legend(plots[[1]] + theme(legend.position = "bottom"))
  pgrid <- cowplot::plot_grid(plotlist = plots)
  cowplot::plot_grid(pgrid, legend, ncol = 1, rel_heights = c(1, .1))
}

# Mahendra's function
# used own color's palette and shape
my_pcoa_with_arrows2 <- function(physeq = mydata_rare, variable1, variable2, shapeval, colorpal) {
  # select samples of interest where all samples were used to calculate the distances between samples 
  # dist.jac is previously calculated with all samples of the study
  #dist.c <- phyloseq::distance(physeq, "jaccard")
  dist.c <- dist.bc %>% as.matrix() %>% `[`(sample_names(physeq), sample_names(physeq)) %>% as.dist()
  ord.c <- ordinate(physeq, "MDS", dist.bc)
  p <- plot_ordination(physeq, ord.c)
  # ## Build arrow data
  plot_data <- p$data 
  arrow_data <- plot_data %>% 
    group_by(pick({{variable2}}, {{variable1}})) %>%
    arrange({{variable1}}) %>%
    mutate(xend = Axis.1, xstart = lag(xend),
           yend = Axis.2, ystart = lag(yend)
    )
  ## Plot with arrows
  p + aes(shape = .data[[ variable2  ]], color = .data[[ variable1  ]]) +
      scale_shape_manual(values = shapeval) +
      scale_color_manual(name = colorpal$name,
                         labels = colorpal$labels,
                         values = colorpal$values) +
    theme_bw() +
    geom_segment(data = arrow_data,
                 aes(x = xstart, y = ystart, xend = xend, yend = yend),
                 linewidth = 0.8, linetype = "3131",
            #     arrow = arrow(length=unit(0.1,"inches")),
                 show.legend = FALSE) +
    geom_point(size = 3) +
    theme(legend.position = "bottom")
}

Show the code

# MDS plot for samples of interests 
# from distance calculated with all samples of the complete study
# used https://colorbrewer2.org/#type=qualitative&scheme=Paired&n=12

# Time_pal <- list(
#   name = "Time",
#   labels = c("T0",  "T3",  "T7",  "T14", "T28"),
#   values = c("#1f78b4", "#b2df8a", "#984ea3", "#ff7f00", "#a65628")
# )


Time_Gluc_pal <- list(
  name = "Time_Gluc",
  labels = c("T3_12C", "T3_13C", "T7_12C", "T7_13C"),
  values = c("#b2df8a", "#33a02c", "#cab2d6", "#984ea3")
)
Temperature_shape <- list(values = c(16,17,15))

my_plot2(variable1 = "Time_Gluc",
        variable2 = "Temperature",
        physeq = physeq_rare,
        shapeval = Temperature_shape$values,
        colorpal = Time_Gluc_pal
        )

Show the code

# MDS plot for samples of interests 
# from distance calculated with all samples of the complete study
# used https://colorbrewer2.org/#type=qualitative&scheme=Paired&n=12

p <- my_pcoa_with_arrows2(variable1 = "Time_Gluc",
        variable2 = "Temperature",
        physeq = physeq_rare,
        shapeval = Temperature_shape$values,
        colorpal = Time_Gluc_pal)
p

MDS plot

The Multidimensional scaling (MDS / PCoA) based on Bray-Curtis distance showed that samples were distributed according to Time_Gluc and Temperature variables on the first two axes. Differences are more important between Temperature 15 and 25. We observed also differences between 12C and 13C. The variance in community composition is mainly explained by Temperature on the axis 1 and by Glucose on the axis 2.

Beta-diversity - Tests

Show the code

metadata <- sample_data(physeq_rare) %>% as("data.frame")
# assess significance for each term sequentially from first to last
# importance of the order of the introduced factors in the model
model <- vegan::adonis2(dist.bc ~ Temperature*Time_Gluc, data = metadata, permutations = 9999, by = "terms")
print(model)

Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 9999

vegan::adonis2(formula = dist.bc ~ Temperature * Time_Gluc, data = metadata, permutations = 9999, by = "terms")
                       Df SumOfSqs      R2       F Pr(>F)    
Temperature             2   7.5330 0.41569 62.4153  1e-04 ***
Time_Gluc               3   3.4981 0.19303 19.3225  1e-04 ***
Temperature:Time_Gluc   6   1.3578 0.07493  3.7501  1e-04 ***
Residual               95   5.7328 0.31635                   
Total                 106  18.1217 1.00000                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Results

The change in microbiota composition measured by the Bray-Curtis Beta diversity is significantly different between Time_Gluc, and Temperature and also between the levels of the interaction factor Time_Gluc:Temperature at 0.05 significance level.

Show the code

par(mar = c(3, 0, 2, 0), cex = 0.6)
phyloseq.extended::plot_clust(physeq_rare, dist = "bray", method = "ward.D2", color = "Group",
           title = "Clustering of samples (Bray-Curtis + Ward)")

Results

Hierarchical clustering based on the Bray-Curtis dissimilarity and the Ward aggregation criterion shows that sample composition is not entirely structured by Time_Gluc and Temperature.

At Phylum taxonomic levels, we created heatmap plot using ordination methods to organize the rows.

Family

Show the code

#  method = NULL if not ordination
p <- phyloseq::plot_heatmap(tax_glom(physeq_rare, taxrank="Phylum"), 
                  method = "MDS",
                  distance = "bray",
                  sample.order= "SampleLabel", # not ordination
                 # taxa.order = "Family",
                  taxa.label = "Phylum") + 
  facet_grid(~ Time_Gluc, scales = "free_x", space = "free_x") + 
  scale_fill_gradient2(low = "#1a9850", mid = "#ffffbf", high = "#d73027",
                       na.value = "white", trans = scales::log_trans(4),
                       midpoint = log(100, base = 4))
p

Reproducibility token

Show the code

sessioninfo::session_info(pkgs = "attached")

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.2 (2024-10-31)
 os       Ubuntu 24.04.1 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  fr_FR.UTF-8
 ctype    fr_FR.UTF-8
 tz       Europe/Paris
 date     2025-02-18
 pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package                  * version date (UTC) lib source
 Biobase                  * 2.64.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 BiocGenerics             * 0.50.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 Biostrings               * 2.72.1  2024-06-02 [1] Bioconductor 3.19 (R 4.4.0)
 ComplexHeatmap           * 2.20.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 data.table               * 1.16.4  2024-12-06 [1] CRAN (R 4.4.2)
 dplyr                    * 1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
 DT                       * 0.33    2024-04-04 [1] CRAN (R 4.4.0)
 emmeans                  * 1.10.7  2025-01-31 [1] CRAN (R 4.4.2)
 forcats                  * 1.0.0   2023-01-29 [1] CRAN (R 4.4.0)
 GenomeInfoDb             * 1.40.1  2024-05-24 [1] Bioconductor 3.19 (R 4.4.0)
 GenomicRanges            * 1.56.2  2024-10-09 [1] Bioconductor 3.19 (R 4.4.1)
 ggplot2                  * 3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
 ggtree                   * 3.12.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 ggtreeExtra              * 1.14.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 InraeThemes              * 3.0.0   2024-05-21 [1] Github (davidcarayon/InraeThemes@9ee5259)
 IRanges                  * 2.38.1  2024-07-03 [1] Bioconductor 3.19 (R 4.4.1)
 kableExtra               * 1.4.0   2024-01-24 [1] CRAN (R 4.4.0)
 lme4                     * 1.1-36  2025-01-11 [1] CRAN (R 4.4.2)
 lmerTest                 * 3.1-3   2020-10-23 [1] CRAN (R 4.4.2)
 lubridate                * 1.9.4   2024-12-08 [1] CRAN (R 4.4.2)
 Matrix                   * 1.7-1   2024-10-18 [4] CRAN (R 4.4.1)
 MatrixGenerics           * 1.16.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 matrixStats              * 1.5.0   2025-01-07 [1] CRAN (R 4.4.2)
 metacoder                * 0.3.7   2024-10-03 [1] Github (grunwaldlab/metacoder@54ef8ea)
 mia                      * 1.15.6  2025-01-07 [1] Github (microbiome/mia@2dc3430)
 MicrobiotaProcess        * 1.16.1  2024-07-28 [1] Bioconductor 3.19 (R 4.4.1)
 MultiAssayExperiment     * 1.30.3  2024-07-10 [1] Bioconductor 3.19 (R 4.4.2)
 openxlsx                 * 4.2.8   2025-01-25 [1] CRAN (R 4.4.2)
 phyloseq                 * 1.48.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)
 phyloseq.extended        * 0.1.4.1 2024-05-21 [1] Github (mahendra-mariadassou/phyloseq-extended@150a871)
 purrr                    * 1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
 readr                    * 2.1.5   2024-01-10 [1] CRAN (R 4.4.0)
 S4Vectors                * 0.42.1  2024-07-03 [1] Bioconductor 3.19 (R 4.4.1)
 scales                   * 1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
 SingleCellExperiment     * 1.26.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.2)
 stringr                  * 1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
 SummarizedExperiment     * 1.34.0  2024-05-01 [1] Bioconductor 3.19 (R 4.4.0)
 tibble                   * 3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
 tidyr                    * 1.3.1   2024-01-24 [1] CRAN (R 4.4.0)
 tidyverse                * 2.0.0   2023-02-22 [1] CRAN (R 4.4.0)
 TreeSummarizedExperiment * 2.12.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.2)
 viridis                  * 0.6.5   2024-01-29 [1] CRAN (R 4.4.0)
 viridisLite              * 0.4.2   2023-05-02 [1] CRAN (R 4.4.0)
 XVector                  * 0.44.0  2024-04-30 [1] Bioconductor 3.19 (R 4.4.0)

 [1] /home/orue/R/x86_64-pc-linux-gnu-library/4.4
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────

References

1. McMurdie PJ, Holmes S. Phyloseq: An r package for reproducible interactive analysis and graphics of microbiome census data. PloS one. 2013;8:e61217.

2. Mariadassou M. Phyloseq-extended: Various customs functions written to enhance the base functions of phyloseq. 2018. https://github.com/mahendra-mariadassou/phyloseq-extended.

Reuse

This document will not be accessible without prior agreement of the partners

A work by Migale Bioinformatics Facility
Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France
Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, 78350, Jouy-en-Josas, France