nachos.splitters package

Submodules

Abstract Splitter module

class nachos.splitters.abstract_splitter.AbstractSplitter(sim_fn, constraint_fn)[source]

Bases: abc.ABC

abstract classmethod build(conf)[source]

__init__(sim_fn, constraint_fn)[source]

abstract __call__(d)[source]

Call self as a function.

Return type: List[Dataset]

score(u, s)[source]

Return type: float

Disconnected Components Splitter

Minimum Node Cut Splitter

class nachos.splitters.min_node_cut.MinNodeCut(sim_fn, constraints, max_iter=200, seed=0)[source]

Bases: nachos.splitters.abstract_splitter.AbstractSplitter

classmethod build(conf)[source]

__init__(sim_fn, constraints, max_iter=200, seed=0)[source]

__call__(d)[source]

Summary:: Given a dataset, split according to a search over minimum-st node cuts, picking the s-source and t-target vetrices that minimize the constraint cost function of the split.

Parameters: d (Dataset) – The dataset to split

Returns: The dataset split and scores
Return type: Tuple[FactoredSplit, List[float]]

Random Search Splitter

class nachos.splitters.random.Random(sim_fn, constraints, max_iter=100000, seed=0)[source]

Bases: nachos.splitters.abstract_splitter.AbstractSplitter

classmethod build(conf)[source]

__init__(sim_fn, constraints, max_iter=100000, seed=0)[source]

__call__(d)[source]

Summary:: Given a dataset, split according to the Random splitter algorithm. We draw random splits (a train and heldout split) keeping track of the one with the best score and return that split. We draw a random subset of values from each factor independently.

Parameters: d (Dataset) – The dataset to split

Returns: The dataset splits
Return type: FactoredSplit

Spectral Clustering Splitter

Variable Neighborhood Search (VNS) splitter

class nachos.splitters.vns.VNS(sim_fn, constraints, num_shake_neighborhoods=4, num_search_neighborhoods=10, max_iter=200, max_neighbors=2000, seed=0)[source]

Bases: nachos.splitters.abstract_splitter.AbstractSplitter

classmethod build(conf)[source]

__init__(sim_fn, constraints, num_shake_neighborhoods=4, num_search_neighborhoods=10, max_iter=200, max_neighbors=2000, seed=0)[source]

__call__(d)[source]

Summary:: Given a dataset, split according using a Variable Neighborhood Search method over feasible solutions. Feasible solutions are constructed by drawing subsets by selecting values from each factor independently (and including all associated data points), and then intersecting these sets. The intersection of these sets is guaranteed to be disjoint from the intersection of the complements of these sets.

Parameters: d (Dataset) – The dataset to split

Returns: The dataset splits and scores
Return type: Tuple[FactoredSplit, List[float]]

Splitters

nachos.splitters.register(name)[source]

nachos.splitters.build_splitter(conf)[source]