Skip to content

FlowMatchingBDT

FlowMatchingBDT

Bases: BaseEstimator

Flow-matching generative model with gradient-boosted decision trees.

Trains one regressor per flow step to predict the velocity field of a probability path between a source distribution and the data distribution. At inference time, integrates the learned vector field with Euler steps to draw new samples from random source noise.

Parameters:

  • n_flow_steps (int, default: 50 ) –

    Number of discrete time steps along the probability path; one regressor is trained per step.

  • n_duplicates (int, default: 100 ) –

    Number of independent noise pairings per training point. Higher values give the regressor more Monte-Carlo coverage of the conditional distribution p(xt | x1).

  • estimator (sklearn-compatible regressor, default: HistGradientBoostingRegressor() ) –

    Base estimator cloned and fit at each flow step. Wrapped in MultiOutputRegressor unless multi_output=True.

  • multi_output (bool, default: False ) –

    If True, the estimator is assumed to handle multi-output regression natively and is not wrapped in MultiOutputRegressor. Set this when using estimators like neural networks that already predict all output dimensions at once.

  • source_distribution (callable, default: np.random.normal ) –

    Function with signature f(size=...) -> ndarray returning samples from the source distribution at t=0.

  • path (ProbabilityPath, default: LinearPath() ) –

    Defines the interpolation between source and data via compute_mu_t and compute_flow.

Attributes:

  • models (list of MultiOutputRegressor) –

    Trained per-step regressors, set by :meth:fit.

  • n_features (int) –

    Dimensionality of the data, set by :meth:fit.

Examples:

Unconditional generation on the two-moons dataset:

>>> from sklearn.datasets import make_moons
>>> from flowmatching_bdt import FlowMatchingBDT
>>> data, _ = make_moons(n_samples=2000, noise=0.05, random_state=0)
>>> model = FlowMatchingBDT()
>>> model.fit(data)
>>> samples = model.predict(num_samples=1000)

Conditional generation by passing labels:

>>> import numpy as np
>>> data, labels = make_moons(n_samples=2000, noise=0.05, random_state=0)
>>> model = FlowMatchingBDT()
>>> model.fit(data, conditions=labels)
>>> samples = model.predict(num_samples=1000, conditions=np.ones(1000))

Functions

fit

fit(x_train: ndarray, conditions: ndarray | None = None) -> FlowMatchingBDT

Fit the generative model to data.

Parameters:

  • x_train ((ndarray, shape(n_samples, n_features))) –

    Real data samples.

  • conditions ((ndarray, shape(n_samples) or (n_samples, n_conditions)), default: None ) –

    Conditioning features for conditional generation.

Returns:

predict

predict(num_samples: int, conditions: ndarray | None = None) -> np.ndarray

Generate new samples from the fitted model.

Parameters:

  • num_samples (int) –

    Number of samples to generate.

  • conditions ((ndarray, shape(num_samples) or (num_samples, n_conditions)), default: None ) –

    Conditioning features, one row per generated sample.

Returns:

  • (ndarray, shape(num_samples, n_features))

    Generated samples.