This is ongoing research conducted at the A2S Lab investigating whether 3D-consistent novel-view synthesis improves data-efficient object detection under viewpoint shift. The core question is whether LoRA adapters fine-tuned on Zero123++-generated synthetic views - which preserve 3D geometry - outperform those trained on 2D augmentation baselines when tested on held-out viewpoints of the same objects from Google Scanned Objects.
Motivation
Real-world robotics and AR applications frequently encounter objects at viewpoints not seen during training. Collecting multi-view imagery for every object class is expensive. Zero123++ offers a way to synthesize geometrically consistent novel views from a single image, but it is not yet clear whether that geometric consistency translates into meaningful gains for downstream detection models compared to cheaper 2D augmentations like rotation and perspective warp.
Technical Approach
The experiment is structured as a controlled comparison: a pretrained detection backbone is fine-tuned with LoRA under two data regimes - synthetic views from Zero123++ and 2D-augmented versions of the same seed images. Both adapters are trained with matched compute and evaluated on a held-out set of novel viewpoints from Google Scanned Objects. UMAP embeddings of feature activations are used to visualize how each adapter organizes viewpoint-variant representations.
Results
This project is ongoing. Results and the full experimental write-up will be released upon completion of the study. The current focus is on ablating the number of synthetic views per object and the angular distribution of viewpoint coverage in the training set.