CVPR 2026  ·  Paper #7426

NexusFlow: Unifying Disparate Tasks under Partial Supervision
via Invertible Flow Networks

Fangzhou Lin1,2,3, Yuping Wang4, Yuliang Guo5†, Zixun Huang5, Xinyu Huang5,
Haichong Zhang1, Kazunori Yamada3, Zhengzhong Tu2★, Liu Ren5, Ziming Zhang1★

1Worcester Polytechnic Institute  ·  2Texas A&M University  ·  3Tohoku University  ·  4University of Michigan  ·  5Bosch Research North America & Bosch Center for AI

Project lead  ·  Corresponding authors

NexusFlow problem setup
Problem Setup & Motivation. We study Partially Supervised Multi-Task Learning with autonomous driving as a representative example. (a) In the ideal case, all data are fully annotated for all tasks; mixed training reaches the upper bound of MTL performance. (b) In practice, community datasets are usually single-task and differ across task labels and domains (e.g., geographic gaps); naive mixing yields incomplete supervision and poor cross-domain generalization. (c) Our goal is to bridge both task and domain gaps via a simple latent-space migration strategy.

Abstract

Partially Supervised Multi-Task Learning (PS-MTL) aims to leverage knowledge across tasks when annotations are incomplete. Existing approaches, however, have largely focused on the simpler setting of homogeneous, dense prediction tasks, leaving the more realistic challenge of learning from structurally diverse tasks unexplored.

To this end, we introduce NexusFlow, a novel, lightweight, and plug-and-play framework effective in both settings. NexusFlow inserts a set of surrogate networks with invertible coupling layers to align the latent feature distributions of tasks, creating a unified representation that enables effective cross-task knowledge transfer. The coupling layers are bijective, preserving information while mapping features into a shared canonical space — avoiding representational collapse and enabling alignment across structurally different tasks without reducing expressive capacity.

We first evaluate NexusFlow on the core challenge of domain-partitioned autonomous driving, where dense map reconstruction and sparse multi-object tracking are supervised in different geographic regions, creating both structural disparity and a strong domain gap. NexusFlow sets a new state-of-the-art on nuScenes, outperforming strong partially supervised baselines. To demonstrate generality, we further test NexusFlow on NYUv2 using three homogeneous dense prediction tasks — segmentation, depth, and surface normals — as a representative N-task PS-MTL scenario. NexusFlow yields consistent gains across all tasks, confirming its broad applicability.

Qualitative Video Comparison on nuScenes

Side-by-side comparison of the UniAD baseline (trained under partial supervision) and NexusFlow on the same nuScenes sequences, jointly predicting multi-object tracking and online map reconstruction. NexusFlow produces noticeably cleaner map elements (lanes, dividers, drivable areas) and more stable object tracks, especially in regions where the baseline lacks task annotations.

Baseline (UniAD, PS-MTL)
Ours — NexusFlow

NexusFlow Pipeline

NexusFlow is a simple yet general framework that scales to the N-task partially supervised MTL setting. Given a data batch where only certain tasks are annotated, we extract latent features hi from all N task heads. These features are then encoded into a shared representation space {zi} via invertible coupling layers, where the model minimizes the distance between each zi and their mean to promote cross-task consistency.

NexusFlow pipeline
For each task we attach a lightweight surrogate network: a deformable-attention feature reducer squashes the large BEV feature map into a compact vector, which is then transformed by an invertible coupling layer into a canonical space. An MSE alignment loss across canonical vectors pulls task distributions together.

Key design choices

What does this mean in simplist, minimal PyTorch?

Vector Illustration
Illustration of the core coupling layer operation.
# 1. Assume you have two task heads, each producing an intermediate embedding 
#    (e.g., after flattening the outputs of the BEV encoder).
x_map   = ...   # (B, h_dim)
x_track = ...   # (B, h_dim)

# 2. Make sure the hidden_dim can be splitted into two halves for the coupling layer.
assert h_dim % 2 == 0, "Hidden dimension must be even for coupling layers"

# 3. Split the features into two halves.
x_map_1,   x_map_2   = x_map.chunk(2,   dim=-1)   # (B, h_dim//2), (B, h_dim//2)
x_track_1, x_track_2 = x_track.chunk(2, dim=-1)   # (B, h_dim//2), (B, h_dim//2)

# 4. Predict scale and shift for both tasks using the first half of the features.
#    The MLP is implemented to double the output dimension, so we can split it into scale and shift.
scale_map,   shift_map   = MLP(x_map_1).chunk(2,   dim=-1)  # (B, h_dim//2), (B, h_dim//2)
scale_track, shift_track = MLP(x_track_1).chunk(2, dim=-1)  # (B, h_dim//2), (B, h_dim//2)

# 5. Stablize training by constraining the scale.
scale_map   = torch.tanh(scale_map)
scale_track = torch.tanh(scale_track)

# 6. Scale and shift the second half of the original features. 
#    This is the core invertible operation.
z_map_2   = x_map_2   * torch.exp(scale_map)   + shift_map    # (B, h_dim//2)
z_track_2 = x_track_2 * torch.exp(scale_track) + shift_track  # (B, h_dim//2)

# 7. The first half of the features remain unchanged, 
#    and we concatenate them back to get the final aligned features.
z_map   = torch.cat([x_map_1,   z_map_2],   dim=-1)  # (B, h_dim)
z_track = torch.cat([x_track_1, z_track_2], dim=-1)  # (B, h_dim)

# 8. Compute the alignment loss (e.g., MSE between the aligned features of the two tasks).
loss_align = F.mse_loss(z_map, z_track)
loss += lambda_align * loss_align
Note: we package this code in our NexusFlow class.

What Does Alignment Look Like?

We visualize the latent feature distributions of the two disparate tasks before and after applying NexusFlow. Without alignment, tracking and mapping features form disjoint clusters; NexusFlow pulls them into a unified canonical space without collapsing either task's information.

t-SNE of latent features before vs after NexusFlow
t-SNE visualization of latent task features — baseline (left) vs. NexusFlow-aligned (right).
Eigenvalue spectrum of feature covariance
Eigenvalue spectrum of the feature covariance. NexusFlow maintains a broad, well-spread spectrum, indicating that the bijective coupling layers preserve dimensionality — in stark contrast to non-invertible aligners which collapse the spectrum and lose task-relevant capacity.

Qualitative Results on nuScenes

Beyond quantitative SOTA, NexusFlow visibly improves both tasks: lane and divider reconstructions are crisper and topologically consistent, while object tracks remain stable even when only the other task was supervised in the corresponding region.

Qualitative comparison on nuScenes — mapping
Map reconstruction comparison. NexusFlow recovers fine-grained structure that the partially supervised baseline misses.
Qualitative comparison on nuScenes — tracking
Multi-object tracking comparison under domain-partitioned supervision.

Drop-in Usage

NexusFlow is designed to be a tiny, self-contained module. Below is the full integration recipe for a BEV-based perception model &mdash (e.g., can be used for UniAD); just a few lines on top of your existing training loop.

# 1. Instantiate alongside your main model
nexusflow = NexusFlow(bev_channels=256, reduced_dim=256)

# 2. Inside your training loop ...
bev_feature = model.bev_encoder(x)             # (B, C, H, W)

map_out   = model.map_decoder(bev_feature)
track_out = model.track_decoder(bev_feature)
l_task    = partial_loss(map_out, track_out, labels, mask)

# 3. Alignment loss from NexusFlow
z_map, z_track = nexusflow(bev_feature)
l_align        = F.mse_loss(z_map, z_track)

# 4. Combine and backprop
total_loss = l_task + lambda_align * l_align
total_loss.backward()
optimizer.step()

Full reference implementation: static/NexusFlow.py  ·  Repository: github.com/ark1234/NexusFlow

Citation

@inproceedings{lin2026nexusflow, title = {NexusFlow: Unifying Disparate Tasks under Partial Supervision via Invertible Flow Networks}, author = {Lin, Fangzhou and Wang, Yuping and Guo, Yuliang and Huang, Zixun and Huang, Xinyu and Zhang, Haichong and Yamada, Kazunori and Tu, Zhengzhong and Ren, Liu and Zhang, Ziming}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2026} }