nachos.constraints package
Submodules
nachos.constraints.Constraints module
- class nachos.constraints.Constraints.Constraints(fns, weights)[source]
Bases:
object
- __call__(u, s, n=None)[source]
- Summary:
This function computes the discompatibility score according to predined constraints (self.fns) of a split s.
- Parameters
u (Dataset) – A dataset
s (
Tuple
[set
,set
]) – A proposed splitn (Optional[int]) – The index of the constraint with respect to which to compute the discompatibility score. None means compute the weighted sum of all constraints
- Returns
The discompatibility score
- Return type
float
- stats(u, s)[source]
Compute the “stats” associated with each constraint on the split.
- Parameters
u (Dataset) – The Dataset from which a subset is drawn
s (set) – The proposed subset of the dataset
- Returns
dictionary of the scores for the set s according to the constraints specified in this class
- Return type
dict
nachos.constraints.abstract_constraint module
nachos.constraints.kl module
- class nachos.constraints.kl.KL(smooth=1e-06, direction='forward')[source]
Bases:
nachos.constraints.abstract_constraint.AbstractConstraint
- Summary:
Defines the constraint on the categorical distribution over values between two datasets. The cost of mismatch is computed as the kl-divergence between two sets. In general, the smaller set is the test set and we would like it to have specific characteristics w/r to the large (training) set. The forward kl, i.e.,
The forward KL, i.e.,
\[kl\left(p \vert\vert q_\theta\right)\]is mean seeking
cost = KL(d1_train || d2_test)
This will encourge selecting data with good coverage of the dataset, including data points that may have been seen only occasionally in the training data. See ReverseKL, Jeffrys for more information.
Reverse KL is
\[kl\left(q_\theta \vert\vert p\right)\]cost = KL(d2_test || d1_train)
This encourages mode seeking behavior.
The Jeffry’s divergence symmetrizes the KL divergence as
\[\frac{1}{2}\left[KL\left(p \vert\vert q_\theta\right) + KL\left(q_\theta \vert\vert p\right)\right]\]
- __call__(c1, c2)[source]
- Summary:
Computes the KL divergence between the empircal distributions defined by values in c1 and values in c2.
- Parameters
c1 (Union[list, Generator]) – the values to constrain seen in dataset 1
c2 (Union[list, Generator]) – the values to constrain seen in dataset 2
- Returns
how closely (0 is best) the sets c1, c2 satisfy the constraint
- Return type
float
nachos.constraints.mean module
- class nachos.constraints.mean.Mean[source]
Bases:
nachos.constraints.abstract_constraint.AbstractConstraint
- Summary:
Defines a constraint on the mean value of a factor. The constraint is that the mean between two Datasets (defined by the Dataset class) should be the same. This class just computes the difference between the means and returns that as a float. Instead of working with the Dataset class directly, this class works on the constraint values in that class.
- __call__(c1, c2)[source]
- Summary:
Computes
\[\lvert \frac{1}{|c1|} \sum c1 - \frac{1}{|c2|} \sum c2 \rvert\]
- Parameters
c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1
c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2
- Returns
the constraint score (how close the constraints are met)
- Return type
float
nachos.constraints.mean_tuple module
- class nachos.constraints.mean_tuple.MeanTuple(s1_mean, s2_mean)[source]
Bases:
nachos.constraints.mean.Mean
- Summary:
Defines the constraint on the mean value of a factor. The constraint is that the mean for two datasets should be close to a specified value.
- __call__(c1, c2)[source]
- Summary:
Given a tuple
\[\mu = \left(\mu_1, \mu_2\right)\]compute
\[\lvert \frac{1}{|c1|} \sum c1 - \mu_1 \rvert + \lvert \frac{1}{|c2|} \sum c2 - \mu_2 \rvert\]
- Parameters
c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1
c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2
- Returns
the constraint score (how close the constraints are met)
- Return type
float
nachos.constraints.sum module
- class nachos.constraints.sum.Sum[source]
Bases:
nachos.constraints.abstract_constraint.AbstractConstraint
- Summary:
Defines the constraint on the mean value of a factor. The constraint is that the mean for two datasets should be close to a specified value.
- __call__(c1, c2)[source]
- Summary:
Computes
\[\lvert \sum c1 - \sum c2 \rvert\]
- Parameters
c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1
c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2
- Returns
the constraint score (how close the constraints are met)
- Return type
float
nachos.constraints.sum_tuple module
- class nachos.constraints.sum_tuple.SumTuple(s1_sum, s2_sum)[source]
Bases:
nachos.constraints.sum.Sum
- Summary:
Defines the constraint on the mean value of a factor. The constraint is that the mean for two datasets should be close to a specified value.
- __call__(c1, c2)[source]
- Summary:
Computes
\[\lvert \sum c1 - \mu_1\rvert + \lvert \sum c2 - \mu_2 \rvert\]
- Parameters
c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1
c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2
- Returns
the constraint score (how close the constraints are met)
- Return type
float