nachos.splitters package

Submodules

Abstract Splitter module

class nachos.splitters.abstract_splitter.AbstractSplitter(sim_fn, constraint_fn)[source]

Bases: abc.ABC

abstract classmethod build(conf)[source]
__init__(sim_fn, constraint_fn)[source]
abstract __call__(d)[source]

Call self as a function.

Return type

List[Dataset]

score(u, s)[source]
Return type

float

Disconnected Components Splitter

Minimum Node Cut Splitter

class nachos.splitters.min_node_cut.MinNodeCut(sim_fn, constraints, max_iter=200, seed=0)[source]

Bases: nachos.splitters.abstract_splitter.AbstractSplitter

classmethod build(conf)[source]
__init__(sim_fn, constraints, max_iter=200, seed=0)[source]
__call__(d)[source]
Summary:

Given a dataset, split according to a search over minimum-st node cuts, picking the s-source and t-target vetrices that minimize the constraint cost function of the split.

Parameters

d (Dataset) – The dataset to split

Returns

The dataset split and scores

Return type

Tuple[FactoredSplit, List[float]]

Random Search Splitter

class nachos.splitters.random.Random(sim_fn, constraints, max_iter=100000, seed=0)[source]

Bases: nachos.splitters.abstract_splitter.AbstractSplitter

classmethod build(conf)[source]
__init__(sim_fn, constraints, max_iter=100000, seed=0)[source]
__call__(d)[source]
Summary:

Given a dataset, split according to the Random splitter algorithm. We draw random splits (a train and heldout split) keeping track of the one with the best score and return that split. We draw a random subset of values from each factor independently.

Parameters

d (Dataset) – The dataset to split

Returns

The dataset splits

Return type

FactoredSplit

Spectral Clustering Splitter

Variable Neighborhood Search (VNS) splitter

class nachos.splitters.vns.VNS(sim_fn, constraints, num_shake_neighborhoods=4, num_search_neighborhoods=10, max_iter=200, max_neighbors=2000, seed=0)[source]

Bases: nachos.splitters.abstract_splitter.AbstractSplitter

classmethod build(conf)[source]
__init__(sim_fn, constraints, num_shake_neighborhoods=4, num_search_neighborhoods=10, max_iter=200, max_neighbors=2000, seed=0)[source]
__call__(d)[source]
Summary:

Given a dataset, split according using a Variable Neighborhood Search method over feasible solutions. Feasible solutions are constructed by drawing subsets by selecting values from each factor independently (and including all associated data points), and then intersecting these sets. The intersection of these sets is guaranteed to be disjoint from the intersection of the complements of these sets.

Parameters

d (Dataset) – The dataset to split

Returns

The dataset splits and scores

Return type

Tuple[FactoredSplit, List[float]]

Splitters

nachos.splitters.register(name)[source]
nachos.splitters.build_splitter(conf)[source]