Tag Archives: R

subsetting a phyloseq object by taxa name

Originally posted by me here: https://github.com/joey711/phyloseq/issues/261#issuecomment-27942652

I get an error when running subset_taxa in a function.

Code to reproduce error:

library(phyloseq)
getRversion()
packageDescription("phyloseq")$Version

test <- function(psdata){
  keep_taxa   <- taxa_names(psdata) %in% taxa_names(psdata)[1:5]
  print(keep_taxa[1:10])   # test that keep_taxa exists
  topNtaxa    <- subset_taxa(psdata, keep_taxa)
  print("success")
}

data(GlobalPatterns)
test(GlobalPatterns)

keep_taxa <- taxa_names(GlobalPatterns) %in% taxa_names(GlobalPatterns)[1:5]

test(GlobalPatterns)

My output when I run this:

> library(phyloseq)
> getRversion()
[1] ‘3.0.2’
> packageDescription("phyloseq")$Version
[1] "1.7.3"
> test <- function(psdata){
+ keep_taxa   <- taxa_names(psdata) %in% taxa_names(psdata)[1:5]
+ print(keep_taxa[1:10]) # test that keep exists
+ topNtaxa    <- subset_taxa(psdata, keep_taxa)
+ print("success")
+ }
> data(GlobalPatterns)
> test(GlobalPatterns)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
Error in eval(expr, envir, enclos) : object 'keep_taxa' not found
> keep_taxa <- taxa_names(GlobalPatterns) %in% taxa_names(GlobalPatterns)[1:5]
> test(GlobalPatterns)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
[1] "success"

It seems that subset_taxa is not looking for the subset logical vector in its current env but in the parent/Global env.

In the first run of test, subset_taxa cannot find the logical vector keep_taxa.

When I define a vector keep_taxa in the parent env test runs fine.

Aaron

Joey replied:

For the test you’ve demonstrated, you want to use prune_taxa rather than subset_taxa. The subset_taxa function wraps R core subset, which has some interesting nuances about when/where it looks for things.

In your example you explicitly define a vector taxa names to keep. That is exactly what prune_taxa is for. By contrast, subset_taxa expects its second argument to be an expression that can contain variable names related to the column names of the taxonomy table in the first argument, if it has a taxonomy table. That is, subset_taxa is intended for something like:

subset_taxa(GlobalPatterns, Kingdom=="Bacteria")

and the goal of your code above would be achieved with:

prune_taxa(keep_taxa, GlobalPatterns)

Hope that helps. Let me know if I missed something from your question, keeping in mind that the flexible evaluation behavior of subset_taxa is intentional, and should be analogous to subset.

Leave a comment

Filed under Bioinformatics hacking