skbio.diversity.alpha.gini_index¶
- skbio.diversity.alpha.gini_index(data, method='rectangles')[source]¶
Calculate the Gini index.
State: Experimental as of 0.4.0.
The Gini index is defined as
\[G=\frac{A}{A+B}\]where \(A\) is the area between \(y=x\) and the Lorenz curve and \(B\) is the area under the Lorenz curve. Simplifies to \(1-2B\) since \(A+B=0.5\).
- Parameters
data (1-D array_like) – Vector of counts, abundances, proportions, etc. All entries must be non-negative.
method ({'rectangles', 'trapezoids'}) – Method for calculating the area under the Lorenz curve. If
'rectangles'
, connects the Lorenz curve points by lines parallel to the x axis. This is the correct method (in our opinion) though'trapezoids'
might be desirable in some circumstances. If'trapezoids'
, connects the Lorenz curve points by linear segments between them. Basically assumes that the given sampling is accurate and that more features of given data would fall on linear gradients between the values of this data.
- Returns
Gini index.
- Return type
double
- Raises
ValueError – If method isn’t one of the supported methods for calculating the area under the curve.
Notes
The Gini index was introduced in 1. The formula for
method='rectangles'
is\[dx\sum_{i=1}^n h_i\]The formula for
method='trapezoids'
is\[dx(\frac{h_0+h_n}{2}+\sum_{i=1}^{n-1} h_i)\]References
- 1
Gini, C. (1912). “Variability and Mutability”, C. Cuppini, Bologna, 156 pages. Reprinted in Memorie di metodologica statistica (Ed. Pizetti E, Salvemini, T). Rome: Libreria Eredi Virgilio Veschi (1955).