NexusFlow: Unifying Disparate Tasks under Partial Supervision via Invertible Flow Networks

Abstract

Partially Supervised Multi-Task Learning (PS-MTL) aims to leverage knowledge across tasks when annotations are incomplete. Existing approaches, however, have largely focused on the simpler setting of homogeneous, dense prediction tasks, leaving the more realistic challenge of learning from structurally diverse tasks unexplored.

To this end, we introduce NexusFlow, a novel, lightweight, and plug-and-play framework effective in both settings. NexusFlow inserts a set of surrogate networks with invertible coupling layers to align the latent feature distributions of tasks, creating a unified representation that enables effective cross-task knowledge transfer. The coupling layers are bijective, preserving information while mapping features into a shared canonical space — avoiding representational collapse and enabling alignment across structurally different tasks without reducing expressive capacity.

We first evaluate NexusFlow on the core challenge of domain-partitioned autonomous driving, where dense map reconstruction and sparse multi-object tracking are supervised in different geographic regions, creating both structural disparity and a strong domain gap. NexusFlow sets a new state-of-the-art on nuScenes, outperforming strong partially supervised baselines. To demonstrate generality, we further test NexusFlow on NYUv2 using three homogeneous dense prediction tasks — segmentation, depth, and surface normals — as a representative N-task PS-MTL scenario. NexusFlow yields consistent gains across all tasks, confirming its broad applicability.

NexusFlow Pipeline

NexusFlow is a simple yet general framework that scales to the N-task partially supervised MTL setting. Given a data batch where only certain tasks are annotated, we extract latent features h_i from all N task heads. These features are then encoded into a shared representation space {z_i} via invertible coupling layers, where the model minimizes the distance between each z_i and their mean to promote cross-task consistency.

For each task we attach a lightweight surrogate network: a deformable-attention feature reducer squashes the large BEV feature map into a compact vector, which is then transformed by an invertible coupling layer into a canonical space. An MSE alignment loss across canonical vectors pulls task distributions together.

Key design choices

Invertible (bijective) coupling layers preserve all task-relevant information while enabling flexible feature transformation — preventing the representational collapse common to vanilla CNN aligners.
Plug-and-play: no change to existing task heads or losses. Drop NexusFlow alongside any BEV-based perception backbone and add L_align to the total loss.
O(N) scaling: one surrogate branch per task, trivially extensible to 3+ tasks (validated on NYUv2 with seg + depth + normal).
Theoretically grounded: invertibility yields a provable bound on feature discrepancy, formally connecting alignment loss to cross-task knowledge transfer.

What does this mean in simplist, minimal PyTorch?

Illustration of the core coupling layer operation.

# 1. Assume you have two task heads, each producing an intermediate embedding 
#    (e.g., after flattening the outputs of the BEV encoder).
x_map   = ...   # (B, h_dim)
x_track = ...   # (B, h_dim)

# 2. Make sure the hidden_dim can be splitted into two halves for the coupling layer.
assert h_dim % 2 == 0, "Hidden dimension must be even for coupling layers"

# 3. Split the features into two halves.
x_map_1,   x_map_2   = x_map.chunk(2,   dim=-1)   # (B, h_dim//2), (B, h_dim//2)
x_track_1, x_track_2 = x_track.chunk(2, dim=-1)   # (B, h_dim//2), (B, h_dim//2)

# 4. Predict scale and shift for both tasks using the first half of the features.
#    The MLP is implemented to double the output dimension, so we can split it into scale and shift.
scale_map,   shift_map   = MLP(x_map_1).chunk(2,   dim=-1)  # (B, h_dim//2), (B, h_dim//2)
scale_track, shift_track = MLP(x_track_1).chunk(2, dim=-1)  # (B, h_dim//2), (B, h_dim//2)

# 5. Stablize training by constraining the scale.
scale_map   = torch.tanh(scale_map)
scale_track = torch.tanh(scale_track)

# 6. Scale and shift the second half of the original features. 
#    This is the core invertible operation.
z_map_2   = x_map_2   * torch.exp(scale_map)   + shift_map    # (B, h_dim//2)
z_track_2 = x_track_2 * torch.exp(scale_track) + shift_track  # (B, h_dim//2)

# 7. The first half of the features remain unchanged, 
#    and we concatenate them back to get the final aligned features.
z_map   = torch.cat([x_map_1,   z_map_2],   dim=-1)  # (B, h_dim)
z_track = torch.cat([x_track_1, z_track_2], dim=-1)  # (B, h_dim)

# 8. Compute the alignment loss (e.g., MSE between the aligned features of the two tasks).
loss_align = F.mse_loss(z_map, z_track)
loss += lambda_align * loss_align

Note: we package this code in our NexusFlow class.

What Does Alignment Look Like?

We visualize the latent feature distributions of the two disparate tasks before and after applying NexusFlow. Without alignment, tracking and mapping features form disjoint clusters; NexusFlow pulls them into a unified canonical space without collapsing either task's information.

t-SNE of latent features before vs after NexusFlow

t-SNE visualization of latent task features — baseline (left) vs. NexusFlow-aligned (right).

Eigenvalue spectrum of feature covariance

Eigenvalue spectrum of the feature covariance. NexusFlow maintains a broad, well-spread spectrum, indicating that the bijective coupling layers preserve dimensionality — in stark contrast to non-invertible aligners which collapse the spectrum and lose task-relevant capacity.

Qualitative Results on nuScenes

Beyond quantitative SOTA, NexusFlow visibly improves both tasks: lane and divider reconstructions are crisper and topologically consistent, while object tracks remain stable even when only the other task was supervised in the corresponding region.

Qualitative comparison on nuScenes — mapping

Map reconstruction comparison. NexusFlow recovers fine-grained structure that the partially supervised baseline misses.

Qualitative comparison on nuScenes — tracking

Multi-object tracking comparison under domain-partitioned supervision.

Drop-in Usage

NexusFlow is designed to be a tiny, self-contained module. Below is the full integration recipe for a BEV-based perception model &mdash (e.g., can be used for UniAD); just a few lines on top of your existing training loop.

# 1. Instantiate alongside your main model
nexusflow = NexusFlow(bev_channels=256, reduced_dim=256)

# 2. Inside your training loop ...
bev_feature = model.bev_encoder(x)             # (B, C, H, W)

map_out   = model.map_decoder(bev_feature)
track_out = model.track_decoder(bev_feature)
l_task    = partial_loss(map_out, track_out, labels, mask)

# 3. Alignment loss from NexusFlow
z_map, z_track = nexusflow(bev_feature)
l_align        = F.mse_loss(z_map, z_track)

# 4. Combine and backprop
total_loss = l_task + lambda_align * l_align
total_loss.backward()
optimizer.step()

Full reference implementation: static/NexusFlow.py · Repository: github.com/ark1234/NexusFlow

Citation

@inproceedings{lin2026nexusflow,
  title     = {NexusFlow: Unifying Disparate Tasks under Partial Supervision via Invertible Flow Networks},
  author    = {Lin, Fangzhou and Wang, Yuping and Guo, Yuliang and Huang, Zixun and Huang, Xinyu and Zhang, Haichong and Yamada, Kazunori and Tu, Zhengzhong and Ren, Liu and Zhang, Ziming},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}