nachos.constraints package

Submodules

nachos.constraints.Constraints module

class nachos.constraints.Constraints.Constraints(fns, weights)[source]

Bases: object

classmethod build(conf)[source]

__init__(fns, weights)[source]

__call__(u, s, n=None)[source]

Summary:: This function computes the discompatibility score according to predined constraints (self.fns) of a split s.

Parameters

u (Dataset) – A dataset
s (Tuple[set, set]) – A proposed split
n (Optional[int]) – The index of the constraint with respect to which to compute the discompatibility score. None means compute the weighted sum of all constraints

Returns: The discompatibility score
Return type: float

stats(u, s)[source]

Compute the “stats” associated with each constraint on the split.

Parameters

u (Dataset) – The Dataset from which a subset is drawn
s (set) – The proposed subset of the dataset

Returns

dictionary of the scores for the set s according to the constraints specified in this class

Return type

dict

nachos.constraints.abstract_constraint module

class nachos.constraints.abstract_constraint.AbstractConstraint[source]

Bases: abc.ABC

abstract classmethod build(conf)[source]

__init__()[source]

abstract __call__(c1, c2)[source]

Call self as a function.

Return type: float

nachos.constraints.kl module

class nachos.constraints.kl.KL(smooth=1e-06, direction='forward')[source]

Bases: nachos.constraints.abstract_constraint.AbstractConstraint

Summary:

Defines the constraint on the categorical distribution over values between two datasets. The cost of mismatch is computed as the kl-divergence between two sets. In general, the smaller set is the test set and we would like it to have specific characteristics w/r to the large (training) set. The forward kl, i.e.,

The forward KL, i.e.,

\[kl\left(p \vert\vert q_\theta\right)\]

is mean seeking

cost = KL(d1_train || d2_test)

This will encourge selecting data with good coverage of the dataset, including data points that may have been seen only occasionally in the training data. See ReverseKL, Jeffrys for more information.

Reverse KL is

\[kl\left(q_\theta \vert\vert p\right)\]

cost = KL(d2_test || d1_train)

This encourages mode seeking behavior.

The Jeffry’s divergence symmetrizes the KL divergence as

\[\frac{1}{2}\left[KL\left(p \vert\vert q_\theta\right) + KL\left(q_\theta \vert\vert p\right)\right]\]

classmethod build(conf)[source]

__init__(smooth=1e-06, direction='forward')[source]

__call__(c1, c2)[source]

Summary:: Computes the KL divergence between the empircal distributions defined by values in c1 and values in c2.

Parameters

c1 (Union[list, Generator]) – the values to constrain seen in dataset 1
c2 (Union[list, Generator]) – the values to constrain seen in dataset 2

Returns: how closely (0 is best) the sets c1, c2 satisfy the constraint
Return type: float

nachos.constraints.mean module

class nachos.constraints.mean.Mean[source]

Bases: nachos.constraints.abstract_constraint.AbstractConstraint

Summary:: Defines a constraint on the mean value of a factor. The constraint is that the mean between two Datasets (defined by the Dataset class) should be the same. This class just computes the difference between the means and returns that as a float. Instead of working with the Dataset class directly, this class works on the constraint values in that class.

classmethod build(conf)[source]

__call__(c1, c2)[source]

Summary:: Computes

\[\lvert \frac{1}{|c1|} \sum c1 - \frac{1}{|c2|} \sum c2 \rvert\]

Parameters

c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1
c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2

Returns: the constraint score (how close the constraints are met)
Return type: float

stat(c1)[source]

Summary:: computes the mean of the values in c1.

Parameters: c1 (Union[list, Generator]) – the list of values over which to compute the mean
Return type: float

nachos.constraints.mean_tuple module

class nachos.constraints.mean_tuple.MeanTuple(s1_mean, s2_mean)[source]

Bases: nachos.constraints.mean.Mean

Summary:: Defines the constraint on the mean value of a factor. The constraint is that the mean for two datasets should be close to a specified value.

classmethod build(conf)[source]

__init__(s1_mean, s2_mean)[source]

__call__(c1, c2)[source]

Summary:

Given a tuple

\[\mu = \left(\mu_1, \mu_2\right)\]

compute

\[\lvert \frac{1}{|c1|} \sum c1 - \mu_1 \rvert + \lvert \frac{1}{|c2|} \sum c2 - \mu_2 \rvert\]

Parameters

c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1
c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2

Returns: the constraint score (how close the constraints are met)
Return type: float

nachos.constraints.sum module

class nachos.constraints.sum.Sum[source]

Bases: nachos.constraints.abstract_constraint.AbstractConstraint

Summary:: Defines the constraint on the mean value of a factor. The constraint is that the mean for two datasets should be close to a specified value.

classmethod build(conf)[source]

__call__(c1, c2)[source]

Summary:: Computes

\[\lvert \sum c1 - \sum c2 \rvert\]

Parameters

c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1
c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2

Returns: the constraint score (how close the constraints are met)
Return type: float

stat(c1)[source]

Summary:: computes the sum of the values in c1.

Parameters: c1 (Union[list, Generator]) – the list of values over which to compute the sum
Return type: float

nachos.constraints.sum_tuple module

class nachos.constraints.sum_tuple.SumTuple(s1_sum, s2_sum)[source]

Bases: nachos.constraints.sum.Sum

Summary:: Defines the constraint on the mean value of a factor. The constraint is that the mean for two datasets should be close to a specified value.

classmethod build(conf)[source]

__init__(s1_sum, s2_sum)[source]

__call__(c1, c2)[source]

Summary:: Computes

\[\lvert \sum c1 - \mu_1\rvert + \lvert \sum c2 - \mu_2 \rvert\]

Parameters

c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1
c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2

Returns: the constraint score (how close the constraints are met)
Return type: float

Module contents

nachos.constraints.register(name)[source]

nachos.constraints.build_constraints(conf)[source]