FlowMatchingBDT
FlowMatchingBDT
Bases: BaseEstimator
Flow-matching generative model with gradient-boosted decision trees.
Trains one regressor per flow step to predict the velocity field of a probability path between a source distribution and the data distribution. At inference time, integrates the learned vector field with Euler steps to draw new samples from random source noise.
Parameters:
-
n_flow_steps(int, default:50) –Number of discrete time steps along the probability path; one regressor is trained per step.
-
n_duplicates(int, default:100) –Number of independent noise pairings per training point. Higher values give the regressor more Monte-Carlo coverage of the conditional distribution
p(xt | x1). -
estimator(sklearn-compatible regressor, default:HistGradientBoostingRegressor()) –Base estimator cloned and fit at each flow step. Wrapped in
MultiOutputRegressorunlessmulti_output=True. -
multi_output(bool, default:False) –If
True, the estimator is assumed to handle multi-output regression natively and is not wrapped inMultiOutputRegressor. Set this when using estimators like neural networks that already predict all output dimensions at once. -
source_distribution(callable, default:np.random.normal) –Function with signature
f(size=...) -> ndarrayreturning samples from the source distribution at t=0. -
path(ProbabilityPath, default:LinearPath()) –Defines the interpolation between source and data via
compute_mu_tandcompute_flow.
Attributes:
-
models(list of MultiOutputRegressor) –Trained per-step regressors, set by :meth:
fit. -
n_features(int) –Dimensionality of the data, set by :meth:
fit.
Examples:
Unconditional generation on the two-moons dataset:
>>> from sklearn.datasets import make_moons
>>> from flowmatching_bdt import FlowMatchingBDT
>>> data, _ = make_moons(n_samples=2000, noise=0.05, random_state=0)
>>> model = FlowMatchingBDT()
>>> model.fit(data)
>>> samples = model.predict(num_samples=1000)
Conditional generation by passing labels:
>>> import numpy as np
>>> data, labels = make_moons(n_samples=2000, noise=0.05, random_state=0)
>>> model = FlowMatchingBDT()
>>> model.fit(data, conditions=labels)
>>> samples = model.predict(num_samples=1000, conditions=np.ones(1000))
Functions
fit
Fit the generative model to data.
Parameters:
-
x_train((ndarray, shape(n_samples, n_features))) –Real data samples.
-
conditions((ndarray, shape(n_samples) or (n_samples, n_conditions)), default:None) –Conditioning features for conditional generation.
Returns:
-
self(FlowMatchingBDT) –Fitted estimator.
predict
Generate new samples from the fitted model.
Parameters:
-
num_samples(int) –Number of samples to generate.
-
conditions((ndarray, shape(num_samples) or (num_samples, n_conditions)), default:None) –Conditioning features, one row per generated sample.
Returns:
-
(ndarray, shape(num_samples, n_features))–Generated samples.