FlowMatchingBDT

Bases: BaseEstimator

Flow-matching generative model with gradient-boosted decision trees.

Trains one regressor per flow step to predict the velocity field of a probability path between a source distribution and the data distribution. At inference time, integrates the learned vector field with Euler steps to draw new samples from random source noise.

Parameters:

n_flow_steps (int, default: 50 ) –

Number of discrete time steps along the probability path; one regressor is trained per step.
n_duplicates (int, default: 100 ) –

Number of independent noise pairings per training point. Higher values give the regressor more Monte-Carlo coverage of the conditional distribution p(xt | x1).
estimator (sklearn-compatible regressor, default: HistGradientBoostingRegressor() ) –

Base estimator cloned and fit at each flow step. Wrapped in MultiOutputRegressor unless multi_output=True.
multi_output (bool, default: False ) –

If True, the estimator is assumed to handle multi-output regression natively and is not wrapped in MultiOutputRegressor. Set this when using estimators like neural networks that already predict all output dimensions at once.
source_distribution (callable, default: np.random.normal ) –

Function with signature f(size=...) -> ndarray returning samples from the source distribution at t=0.
path (ProbabilityPath, default: LinearPath() ) –

Defines the interpolation between source and data via compute_mu_t and compute_flow.

Attributes:

models (list of MultiOutputRegressor) –

Trained per-step regressors, set by :meth:fit.
n_features (int) –

Dimensionality of the data, set by :meth:fit.

Examples:

Unconditional generation on the two-moons dataset:

>>> from sklearn.datasets import make_moons
>>> from flowmatching_bdt import FlowMatchingBDT
>>> data, _ = make_moons(n_samples=2000, noise=0.05, random_state=0)
>>> model = FlowMatchingBDT()
>>> model.fit(data)
>>> samples = model.predict(num_samples=1000)

Conditional generation by passing labels:

>>> import numpy as np
>>> data, labels = make_moons(n_samples=2000, noise=0.05, random_state=0)
>>> model = FlowMatchingBDT()
>>> model.fit(data, conditions=labels)
>>> samples = model.predict(num_samples=1000, conditions=np.ones(1000))

Functions

fit

fit(x_train: ndarray, conditions: ndarray | None = None) -> FlowMatchingBDT

Fit the generative model to data.

Parameters:

x_train ((ndarray, shape(n_samples, n_features))) –

Real data samples.
conditions ((ndarray, shape(n_samples) or (n_samples, n_conditions)), default: None ) –

Conditioning features for conditional generation.

Returns:

self ( FlowMatchingBDT ) –

Fitted estimator.

predict

predict(num_samples: int, conditions: ndarray | None = None) -> np.ndarray

Generate new samples from the fitted model.

Parameters:

num_samples (int) –

Number of samples to generate.
conditions ((ndarray, shape(num_samples) or (num_samples, n_conditions)), default: None ) –

Conditioning features, one row per generated sample.

Returns:

(ndarray, shape(num_samples, n_features)) –

Generated samples.