UCSD ERSP Research | Fong-Yu (Yang) Lin

During my computer vision research at UCSD, I contributed to developing innovative methodologies for 3D scene understanding, evaluation, and synthesis by integrating large language models (LLMs) and vision-language models (VLMs) with a team of two other undergraduate researchers. Our work focused on leveraging these tools to address spatial reasoning challenges and optimize 3D scene layouts.

Key Contributions

Enhanced the SceneProgLLM framework by integrating advanced APIs like Anthropic and Ollama and enabling support for local images in scene rendering pipelines.
Collaborated on designing Domain-Specific Languages (DSLs) for scene synthesis to provide structured interactivity between LLMs and 3D databases like 3DFront.
Designed and implemented prompts to evaluate the impact of different ceiling light positions on 3D scene illumination, leveraging LLMs to generate scored image outputs with detailed reasoning.
Implemented functions in Blender for light placement optimization and explored both heuristic-based approaches and reinforcement learning techniques.

Methodology & Approach

Our research involved designing strategies to communicate global and local frames of reference to LLMs for tasks like object localization and lighting direction analysis. We integrated computer vision tools like SAM2, Detectron2, and CLIP into a visual evaluator pipeline to improve semantic filtering and depth estimation for scene evaluation. This work included experimenting with structured outputs under varying conditions and exploring the trade-offs between engineered DSLs and LLM-designed DSLs.

Research Showcase

Final Research Poster

We presented our findings at the Undergrad Engineering Research Symposium, summarizing our methodology for using agentic VLMs for 3D scene evaluation.

Symposium & Team Photos

Our work involved regular team meetings, brainstorming sessions, and presenting at the final symposium.

Research Discussion

Planning Research Direction

Analyzing Results

Whiteboard Brainstorming

Symposium Completion

Poster Presentation

Technologies & Tools

Python
Blender
LangChain
CLIP
SAM2
Detectron2
LLaMA
OpenAI APIs
Docker
GitHub

← Back to All Experiences