nachos.splitters package
Submodules
Abstract Splitter module
Disconnected Components Splitter
Minimum Node Cut Splitter
- class nachos.splitters.min_node_cut.MinNodeCut(sim_fn, constraints, max_iter=200, seed=0)[source]
Bases:
nachos.splitters.abstract_splitter.AbstractSplitter
- __call__(d)[source]
- Summary:
Given a dataset, split according to a search over minimum-st node cuts, picking the s-source and t-target vetrices that minimize the constraint cost function of the split.
- Parameters
d (Dataset) – The dataset to split
- Returns
The dataset split and scores
- Return type
Tuple[FactoredSplit, List[float]]
Random Search Splitter
- class nachos.splitters.random.Random(sim_fn, constraints, max_iter=100000, seed=0)[source]
Bases:
nachos.splitters.abstract_splitter.AbstractSplitter
- __call__(d)[source]
- Summary:
Given a dataset, split according to the Random splitter algorithm. We draw random splits (a train and heldout split) keeping track of the one with the best score and return that split. We draw a random subset of values from each factor independently.
- Parameters
d (Dataset) – The dataset to split
- Returns
The dataset splits
- Return type
FactoredSplit
Spectral Clustering Splitter
Variable Neighborhood Search (VNS) splitter
- class nachos.splitters.vns.VNS(sim_fn, constraints, num_shake_neighborhoods=4, num_search_neighborhoods=10, max_iter=200, max_neighbors=2000, seed=0)[source]
Bases:
nachos.splitters.abstract_splitter.AbstractSplitter
- __init__(sim_fn, constraints, num_shake_neighborhoods=4, num_search_neighborhoods=10, max_iter=200, max_neighbors=2000, seed=0)[source]
- __call__(d)[source]
- Summary:
Given a dataset, split according using a Variable Neighborhood Search method over feasible solutions. Feasible solutions are constructed by drawing subsets by selecting values from each factor independently (and including all associated data points), and then intersecting these sets. The intersection of these sets is guaranteed to be disjoint from the intersection of the complements of these sets.
- Parameters
d (Dataset) – The dataset to split
- Returns
The dataset splits and scores
- Return type
Tuple[FactoredSplit, List[float]]