nachos.constraints package

Submodules

nachos.constraints.Constraints module

class nachos.constraints.Constraints.Constraints(fns, weights)[source]

Bases: object

classmethod build(conf)[source]
__init__(fns, weights)[source]
__call__(u, s, n=None)[source]
Summary:

This function computes the discompatibility score according to predined constraints (self.fns) of a split s.

Parameters
  • u (Dataset) – A dataset

  • s (Tuple[set, set]) – A proposed split

  • n (Optional[int]) – The index of the constraint with respect to which to compute the discompatibility score. None means compute the weighted sum of all constraints

Returns

The discompatibility score

Return type

float

stats(u, s)[source]

Compute the “stats” associated with each constraint on the split.

Parameters
  • u (Dataset) – The Dataset from which a subset is drawn

  • s (set) – The proposed subset of the dataset

Returns

dictionary of the scores for the set s according to the constraints specified in this class

Return type

dict

nachos.constraints.abstract_constraint module

class nachos.constraints.abstract_constraint.AbstractConstraint[source]

Bases: abc.ABC

abstract classmethod build(conf)[source]
__init__()[source]
abstract __call__(c1, c2)[source]

Call self as a function.

Return type

float

nachos.constraints.kl module

class nachos.constraints.kl.KL(smooth=1e-06, direction='forward')[source]

Bases: nachos.constraints.abstract_constraint.AbstractConstraint

Summary:

Defines the constraint on the categorical distribution over values between two datasets. The cost of mismatch is computed as the kl-divergence between two sets. In general, the smaller set is the test set and we would like it to have specific characteristics w/r to the large (training) set. The forward kl, i.e.,

The forward KL, i.e.,

\[kl\left(p \vert\vert q_\theta\right)\]

is mean seeking

cost = KL(d1_train || d2_test)

This will encourge selecting data with good coverage of the dataset, including data points that may have been seen only occasionally in the training data. See ReverseKL, Jeffrys for more information.

Reverse KL is

\[kl\left(q_\theta \vert\vert p\right)\]

cost = KL(d2_test || d1_train)

This encourages mode seeking behavior.

The Jeffry’s divergence symmetrizes the KL divergence as

\[\frac{1}{2}\left[KL\left(p \vert\vert q_\theta\right) + KL\left(q_\theta \vert\vert p\right)\right]\]
classmethod build(conf)[source]
__init__(smooth=1e-06, direction='forward')[source]
__call__(c1, c2)[source]
Summary:

Computes the KL divergence between the empircal distributions defined by values in c1 and values in c2.

Parameters
  • c1 (Union[list, Generator]) – the values to constrain seen in dataset 1

  • c2 (Union[list, Generator]) – the values to constrain seen in dataset 2

Returns

how closely (0 is best) the sets c1, c2 satisfy the constraint

Return type

float

nachos.constraints.mean module

class nachos.constraints.mean.Mean[source]

Bases: nachos.constraints.abstract_constraint.AbstractConstraint

Summary:

Defines a constraint on the mean value of a factor. The constraint is that the mean between two Datasets (defined by the Dataset class) should be the same. This class just computes the difference between the means and returns that as a float. Instead of working with the Dataset class directly, this class works on the constraint values in that class.

classmethod build(conf)[source]
__call__(c1, c2)[source]
Summary:

Computes

\[\lvert \frac{1}{|c1|} \sum c1 - \frac{1}{|c2|} \sum c2 \rvert\]
Parameters
  • c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1

  • c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2

Returns

the constraint score (how close the constraints are met)

Return type

float

stat(c1)[source]
Summary:

computes the mean of the values in c1.

Parameters

c1 (Union[list, Generator]) – the list of values over which to compute the mean

Return type

float

nachos.constraints.mean_tuple module

class nachos.constraints.mean_tuple.MeanTuple(s1_mean, s2_mean)[source]

Bases: nachos.constraints.mean.Mean

Summary:

Defines the constraint on the mean value of a factor. The constraint is that the mean for two datasets should be close to a specified value.

classmethod build(conf)[source]
__init__(s1_mean, s2_mean)[source]
__call__(c1, c2)[source]
Summary:

Given a tuple

\[\mu = \left(\mu_1, \mu_2\right)\]

compute

\[\lvert \frac{1}{|c1|} \sum c1 - \mu_1 \rvert + \lvert \frac{1}{|c2|} \sum c2 - \mu_2 \rvert\]
Parameters
  • c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1

  • c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2

Returns

the constraint score (how close the constraints are met)

Return type

float

nachos.constraints.sum module

class nachos.constraints.sum.Sum[source]

Bases: nachos.constraints.abstract_constraint.AbstractConstraint

Summary:

Defines the constraint on the mean value of a factor. The constraint is that the mean for two datasets should be close to a specified value.

classmethod build(conf)[source]
__call__(c1, c2)[source]
Summary:

Computes

\[\lvert \sum c1 - \sum c2 \rvert\]
Parameters
  • c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1

  • c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2

Returns

the constraint score (how close the constraints are met)

Return type

float

stat(c1)[source]
Summary:

computes the sum of the values in c1.

Parameters

c1 (Union[list, Generator]) – the list of values over which to compute the sum

Return type

float

nachos.constraints.sum_tuple module

class nachos.constraints.sum_tuple.SumTuple(s1_sum, s2_sum)[source]

Bases: nachos.constraints.sum.Sum

Summary:

Defines the constraint on the mean value of a factor. The constraint is that the mean for two datasets should be close to a specified value.

classmethod build(conf)[source]
__init__(s1_sum, s2_sum)[source]
__call__(c1, c2)[source]
Summary:

Computes

\[\lvert \sum c1 - \mu_1\rvert + \lvert \sum c2 - \mu_2 \rvert\]
Parameters
  • c1 (Union[list, Generator]) – the list of values to constrain associated with dataset 1

  • c2 (Union[list, Generator]) – the list of values to constrain associated with dataset 2

Returns

the constraint score (how close the constraints are met)

Return type

float

Module contents

nachos.constraints.register(name)[source]
nachos.constraints.build_constraints(conf)[source]