CVPR 2026 · Paper #7426
1Worcester Polytechnic Institute · 2Texas A&M University · 3Tohoku University · 4University of Michigan · 5Bosch Research North America & Bosch Center for AI
†Project lead · ★Corresponding authors
Partially Supervised Multi-Task Learning (PS-MTL) aims to leverage knowledge across tasks when annotations are incomplete. Existing approaches, however, have largely focused on the simpler setting of homogeneous, dense prediction tasks, leaving the more realistic challenge of learning from structurally diverse tasks unexplored.
To this end, we introduce NexusFlow, a novel, lightweight, and plug-and-play framework effective in both settings. NexusFlow inserts a set of surrogate networks with invertible coupling layers to align the latent feature distributions of tasks, creating a unified representation that enables effective cross-task knowledge transfer. The coupling layers are bijective, preserving information while mapping features into a shared canonical space — avoiding representational collapse and enabling alignment across structurally different tasks without reducing expressive capacity.
We first evaluate NexusFlow on the core challenge of domain-partitioned autonomous driving, where dense map reconstruction and sparse multi-object tracking are supervised in different geographic regions, creating both structural disparity and a strong domain gap. NexusFlow sets a new state-of-the-art on nuScenes, outperforming strong partially supervised baselines. To demonstrate generality, we further test NexusFlow on NYUv2 using three homogeneous dense prediction tasks — segmentation, depth, and surface normals — as a representative N-task PS-MTL scenario. NexusFlow yields consistent gains across all tasks, confirming its broad applicability.
Side-by-side comparison of the UniAD baseline (trained under partial supervision) and NexusFlow on the same nuScenes sequences, jointly predicting multi-object tracking and online map reconstruction. NexusFlow produces noticeably cleaner map elements (lanes, dividers, drivable areas) and more stable object tracks, especially in regions where the baseline lacks task annotations.
NexusFlow is a simple yet general framework that scales to the N-task partially supervised MTL setting. Given a data batch where only certain tasks are annotated, we extract latent features hi from all N task heads. These features are then encoded into a shared representation space {zi} via invertible coupling layers, where the model minimizes the distance between each zi and their mean to promote cross-task consistency.
# 1. Assume you have two task heads, each producing an intermediate embedding # (e.g., after flattening the outputs of the BEV encoder). x_map = ... # (B, h_dim) x_track = ... # (B, h_dim) # 2. Make sure the hidden_dim can be splitted into two halves for the coupling layer. assert h_dim % 2 == 0, "Hidden dimension must be even for coupling layers" # 3. Split the features into two halves. x_map_1, x_map_2 = x_map.chunk(2, dim=-1) # (B, h_dim//2), (B, h_dim//2) x_track_1, x_track_2 = x_track.chunk(2, dim=-1) # (B, h_dim//2), (B, h_dim//2) # 4. Predict scale and shift for both tasks using the first half of the features. # The MLP is implemented to double the output dimension, so we can split it into scale and shift. scale_map, shift_map = MLP(x_map_1).chunk(2, dim=-1) # (B, h_dim//2), (B, h_dim//2) scale_track, shift_track = MLP(x_track_1).chunk(2, dim=-1) # (B, h_dim//2), (B, h_dim//2) # 5. Stablize training by constraining the scale. scale_map = torch.tanh(scale_map) scale_track = torch.tanh(scale_track) # 6. Scale and shift the second half of the original features. # This is the core invertible operation. z_map_2 = x_map_2 * torch.exp(scale_map) + shift_map # (B, h_dim//2) z_track_2 = x_track_2 * torch.exp(scale_track) + shift_track # (B, h_dim//2) # 7. The first half of the features remain unchanged, # and we concatenate them back to get the final aligned features. z_map = torch.cat([x_map_1, z_map_2], dim=-1) # (B, h_dim) z_track = torch.cat([x_track_1, z_track_2], dim=-1) # (B, h_dim) # 8. Compute the alignment loss (e.g., MSE between the aligned features of the two tasks). loss_align = F.mse_loss(z_map, z_track) loss += lambda_align * loss_align
We visualize the latent feature distributions of the two disparate tasks before and after applying NexusFlow. Without alignment, tracking and mapping features form disjoint clusters; NexusFlow pulls them into a unified canonical space without collapsing either task's information.
Beyond quantitative SOTA, NexusFlow visibly improves both tasks: lane and divider reconstructions are crisper and topologically consistent, while object tracks remain stable even when only the other task was supervised in the corresponding region.
NexusFlow is designed to be a tiny, self-contained module. Below is the full integration recipe for a BEV-based perception model &mdash (e.g., can be used for UniAD); just a few lines on top of your existing training loop.
# 1. Instantiate alongside your main model nexusflow = NexusFlow(bev_channels=256, reduced_dim=256) # 2. Inside your training loop ... bev_feature = model.bev_encoder(x) # (B, C, H, W) map_out = model.map_decoder(bev_feature) track_out = model.track_decoder(bev_feature) l_task = partial_loss(map_out, track_out, labels, mask) # 3. Alignment loss from NexusFlow z_map, z_track = nexusflow(bev_feature) l_align = F.mse_loss(z_map, z_track) # 4. Combine and backprop total_loss = l_task + lambda_align * l_align total_loss.backward() optimizer.step()
Full reference implementation: static/NexusFlow.py · Repository: github.com/ark1234/NexusFlow
@inproceedings{lin2026nexusflow,
title = {NexusFlow: Unifying Disparate Tasks under Partial Supervision via Invertible Flow Networks},
author = {Lin, Fangzhou and Wang, Yuping and Guo, Yuliang and Huang, Zixun and Huang, Xinyu and Zhang, Haichong and Yamada, Kazunori and Tu, Zhengzhong and Ren, Liu and Zhang, Ziming},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}