Cohen et al sites were chosen because they contained sequence data from the largest number of unique patients.38hm-NAS genes and highly similar genes in the hMP non redundant gene set(E- value <e 4 )were aligned to shotgun sequencing reads from each patient sample taken from different sites in the human microbiome Aligned reads were normalized to hm-NAS gene length and sequencing depth of each dataset. The normalized count of the reads aligned to each hm-NAS gene or its highly similar gene from the HMP non redundant gene set were scaled [O-l] and color coded per body site, and added as concentric rings around the phylogenetic tree(Fig. 1A). To determine the variability and distribution of hm-NAS gene hat correspond to specific N-acyl amide families 1-6(Fig. 1)in the human microbiome normalized read counts for hm-NAS gene from each N-acyl amide family were plotted separately per body site as Reads per Kilobase of Gene Per Million Reads(RPKM)(Fig Ic).ThetreeinFigureIwasplottedusinggraphlan(https://huttenhower.sph.harvard.edu/ Analysis of metatranscriptome datasets Two RNAseq datasets were identified with multiple patient samples taken from separate sites in the human microbiome.39, 40 One RnaSeq dataset was part of the Hmp(Http: // hmpdacc. org/RSEQ/) and generated from supragingival samples taken from twin pairs with and without dental caries. The second RNAseq dataset was generated from stool samples and compared different RNA extraction methods. We used only samples labeled"whole which functioned as controls for the original study. Alignment of all hm-NAS genes to each dataset only identified hm-NAS genes from N-acyl amide family I and 2 in each of the RNASeq datasets(I in stool, 2 in supragingival plaque). To explore whether hm-NAS gene expression might vary in patient samples we performed two different analyses. In the first analysis we identified reference genomes containing hm-NAS genes identical to those we used in heterologous expression experiments for molecule families I and 2 (Bacteroides dorer for compound 1, Capnocytophaga ochracea for compound 2). RNAseq reads were aligned to all of the genes from each reference genome. For each genome the average per gene read density normalized for gene length was compared to the read density seen for the hm-NAS gene. The percentile of the normalized expression of each hm-NAS gene was then plotted(0 for not expressed, I for the most expressed) and compared between patient samples for each RNAseq dataset(Fig. 2a). In the second analysis the direct correlation between DNA and RNa abundance was determined for the stool metatranscriptome datas for which DNA reads were also available.39 RNAseq and shotgun-sequenced dNA reads were aligned to the 15 hm-NAS genes from N-acyl amide family I that encoded for N-acyl glycines(Supplementary Table 1). The reads were normalized(rPKm) and each hm-NAS gene from each patient sample was plotted as a single point with DNA and rNa read counts on the X and Y axis( Fig. 2b) Heterologous expression of PFAM13444 genes in Escherichia coli The 44 hm-NAS genes we examined by heterologous expression were codon optimized, appended with Ncol and Ndel sites at the n and c terminus respectively and synthesized by Geng Genes obtained from Gen were digested with Ndel and Ncol and ligated into the orresponding restriction sites in pET28c(Novagen). For heterologous expression purposes the resulting constructs were transformed into E. coli EC100 containing the T7 polymerase Nature. Author manuscript; available in PMC 2018 February 28sites were chosen because they contained sequence data from the largest number of unique patients.38 hm-NAS genes and highly similar genes in the HMP non redundant gene set (Evalue < e−40) were aligned to shotgun sequencing reads from each patient sample taken from different sites in the human microbiome. Aligned reads were normalized to hm-NAS gene length and sequencing depth of each dataset. The normalized count of the reads aligned to each hm-NAS gene or its highly similar gene from the HMP non redundant gene set were scaled [0–1] and color coded per body site, and added as concentric rings around the phylogenetic tree (Fig. 1A). To determine the variability and distribution of hm-NAS genes that correspond to specific N-acyl amide families 1–6 (Fig. 1) in the human microbiome normalized read counts for hm-NAS gene from each N-acyl amide family were plotted separately per body site as Reads per Kilobase of Gene Per Million Reads (RPKM) (Fig. 1c). The tree in Figure 1 was plotted using graphlan (https://huttenhower.sph.harvard.edu/ graphlan). Analysis of metatranscriptome datasets Two RNAseq datasets were identified with multiple patient samples taken from separate sites in the human microbiome.39,40 One RNAseq dataset was part of the HMP (http:// hmpdacc.org/RSEQ/) and generated from supragingival samples taken from twin pairs with and without dental caries. The second RNAseq dataset was generated from stool samples and compared different RNA extraction methods. We used only samples labeled “whole” which functioned as controls for the original study.39 Alignment of all hm-NAS genes to each dataset only identified hm-NAS genes from N-acyl amide family 1 and 2 in each of the RNAseq datasets (1 in stool, 2 in supragingival plaque). To explore whether hm-NAS gene expression might vary in patient samples we performed two different analyses. In the first analysis we identified reference genomes containing hm-NAS genes identical to those we used in heterologous expression experiments for molecule families 1 and 2 (Bacteroides dorei for compound 1, Capnocytophaga ochracea for compound 2). RNAseq reads were aligned to all of the genes from each reference genome. For each genome the average per gene read density normalized for gene length was compared to the read density seen for the hm-NAS gene. The percentile of the normalized expression of each hm-NAS gene was then plotted (0 for not expressed, 1 for the most expressed) and compared between patient samples for each RNAseq dataset (Fig. 2a). In the second analysis the direct correlation between DNA and RNA abundance was determined for the stool metatranscriptome dataset for which DNA reads were also available.39 RNAseq and shotgun-sequenced DNA reads were aligned to the 15 hm-NAS genes from N-acyl amide family 1 that encoded for N-acyl glycines (Supplementary Table 1). The reads were normalized (RPKM) and each hm-NAS gene from each patient sample was plotted as a single point with DNA and RNA read counts on the X and Y axis (Fig. 2b). Heterologous expression of PFAM13444 genes in Escherichia coli The 44 hm-NAS genes we examined by heterologous expression were codon optimized, appended with NcoI and NdeI sites at the N and C terminus respectively and synthesized by Gen9. Genes obtained from Gen9 were digested with NdeI and NcoI and ligated into the corresponding restriction sites in pET28c (Novagen). For heterologous expression purposes the resulting constructs were transformed into E. coli EC100 containing the T7 polymerase Cohen et al. Page 9 Nature. Author manuscript; available in PMC 2018 February 28. Author Manuscript Author Manuscript Author Manuscript Author Manuscript