During my computer vision research at UCSD, I contributed to developing innovative methodologies for 3D scene understanding, evaluation, and synthesis by integrating large language models (LLMs) and vision-language models (VLMs) with a team of two other undergraduate researchers. Our work focused on leveraging these tools to address spatial reasoning challenges and optimize 3D scene layouts.
Key Contributions
- Enhanced the SceneProgLLM framework by integrating advanced APIs like Anthropic and Ollama and enabling support for local images in scene rendering pipelines.
- Collaborated on designing Domain-Specific Languages (DSLs) for scene synthesis to provide structured interactivity between LLMs and 3D databases like 3DFront.
- Designed and implemented prompts to evaluate the impact of different ceiling light positions on 3D scene illumination, leveraging LLMs to generate scored image outputs with detailed reasoning.
- Implemented functions in Blender for light placement optimization and explored both heuristic-based approaches and reinforcement learning techniques.
Methodology & Approach
Our research involved designing strategies to communicate global and local frames of reference to LLMs for tasks like object localization and lighting direction analysis. We integrated computer vision tools like SAM2, Detectron2, and CLIP into a visual evaluator pipeline to improve semantic filtering and depth estimation for scene evaluation. This work included experimenting with structured outputs under varying conditions and exploring the trade-offs between engineered DSLs and LLM-designed DSLs.
Research Showcase
Final Research Poster
We presented our findings at the Undergrad Engineering Research Symposium, summarizing our methodology for using agentic VLMs for 3D scene evaluation.
Symposium & Team Photos
Our work involved regular team meetings, brainstorming sessions, and presenting at the final symposium.
Research Discussion
Planning Research Direction
Analyzing Results
Whiteboard Brainstorming
Symposium Completion
Poster Presentation
Technologies & Tools
- Python
- Blender
- LangChain
- CLIP
- SAM2
- Detectron2
- LLaMA
- OpenAI APIs
- Docker
- GitHub