skbio.diversity.block_beta_diversity¶
- skbio.diversity.block_beta_diversity(metric, counts, ids, validate=True, k=64, reduce_f=None, map_f=None, **kwargs)[source]¶
Perform a block-decomposition beta diversity calculation
State: Experimental as of 0.5.1.
- Parameters
metric (str or callable) – The pairwise distance function to apply. If
metric
is a string, it must be resolvable by scikit-bio (e.g., UniFrac methods), or must be callable.counts (2D array_like of ints or floats) – Matrix containing count/abundance data where each row contains counts of OTUs in a given sample.
ids (iterable of strs) – Identifiers for each sample in
counts
.validate (bool, optional) – See
skbio.diversity.beta_diversity
for details.reduce_f (function, optional) –
A method to reduce PartialDistanceMatrix objects into a single DistanceMatrix. The expected signature is:
f(Iterable of DistanceMatrix) -> DistanceMatrix
Note, this is the reduce within a map/reduce.
map_f (function, optional) –
A method that accepts a _block_compute. The expected signature is:
f(**kwargs) -> DistanceMatrix
NOTE: ipyparallel’s map_async will not work here as we need to be able to pass around **kwargs`.
k (int, optional) – The blocksize used when computing distances
kwargs (kwargs, optional) – Metric-specific parameters.
- Returns
A distance matrix relating all samples represented by counts to each other.
- Return type
Notes
This method is designed to facilitate computing beta diversity in parallel. In general, if you are processing a few hundred samples or less, then it is likely the case that skbio.diversity.beta_diversity will be faster. The original need which motivated the development of this method was processing the Earth Microbiome Project 1 dataset which at the time spanned over 25,000 samples and 7.5 million open reference OTUs.
References