Ubitus Internship | Fong-Yu (Yang) Lin

During my 12-week internship at Ubitus K.K., I joined the Terminal Device Group (TDG) working across web frontend, 3D, and CNF (Console and Frontier) teams. This transformative experience involved developing cutting-edge AI infrastructure solutions, from semantic search optimization to company-wide security frameworks, while gaining profound insights into engineering philosophy and problem-solving approaches.

Project Overview

My primary focus was developing a Model Context Protocol (MCP) server to enable UbiGPT, the company's large language model, to perform accurate and efficient semantic searches across multiple Milvus vector database collections. The project evolved through three distinct phases, each building upon the previous to create a comprehensive solution that achieved a 77.8% F1 Score.

Beyond the core MCP project, I architected a company-wide prompt injection defense framework and served as the primary technical operator for the G1 humanoid robot at the AI WAVE exhibition, providing a well-rounded experience in AI development, security, and real-world applications.

Phase 1: MCP Server Foundation

The journey began with inheriting and transforming a coworker's database search algorithm from milvus_memory.py into a robust, containerized MCP server. This foundational work established the groundwork for all subsequent enhancements.

System Architecture Development

Designed and built a Python-based MCP server to act as middleware between the UbiGPT LLM and the Milvus vector database. The architecture required careful consideration of connection stability, data formatting, and error handling to ensure reliable operation in a production environment.

Key Technical Achievements

Successfully containerized the existing database search script using Docker
Established stable connection checks and consistent aliasing for MilvusDB
Engineered data standardization, transforming returned document format from a list of dictionaries into structured JSON strings
Implemented robust error handling and logging mechanisms
Created comprehensive documentation for future maintenance and scaling

This phase laid the critical groundwork for single collection searches while designing the system with extensibility in mind for future cross-collection capabilities.

Phase 2: Cross-Collection Search Architecture

The most intellectually challenging phase involved enabling searches across multiple database collections. This required innovative thinking about how to balance LLM cognitive load, system performance, and search accuracy.

Three Architectural Approaches

Method A: Naive Approach (Discarded)

Initially considered searching all collections simultaneously but proactively discarded this approach after identifying significant performance bottlenecks and resource consumption issues.

Method B: "Three Tools" Approach

Developed a sophisticated system providing the LLM with three discrete tools: list_collections, list_schemas, and search. This enabled agentic discovery and step-by-step query processing, giving the LLM maximum flexibility in determining search strategies.

Method C: "One Tool" Approach (Selected)

Engineered a more efficient solution that abstracts database complexity by pre-generating comprehensive descriptions of all collections and schemas. This approach reduced cognitive load on the LLM while maintaining search effectiveness and improving overall system performance.

Implementation Details

The "One Tool" approach involved creating intelligent abstractions that combined collection metadata with schema information into easily digestible formats for the LLM. This required careful consideration of information density, relevance filtering, and presentation formatting to optimize LLM comprehension without overwhelming the context window.

Phase 3: Search Result Optimization

The final phase focused on dramatically improving search accuracy by transitioning from basic distance metrics to sophisticated semantic analysis techniques.

From L2 Distance to Re-ranking Models

Initially, the system relied on L2 (Euclidean) distance for filtering search results. While computationally efficient, this approach often missed semantically relevant results that didn't align perfectly in vector space. The transition to re-ranking models represented a significant leap in search quality.

Re-ranking Implementation

Leveraged advanced re-ranking models to analyze semantic relevance between raw user query text and retrieved documents. This approach considers contextual meaning, synonyms, and conceptual relationships that pure vector distance calculations might miss.

Trade-off Analysis

L2 Distance Method

Pros: Faster computation, lower latency
Cons: Less nuanced, may miss relevant results
Use Case: High-volume, speed-critical applications

Re-ranking Model

Pros: Higher precision, semantic understanding
Cons: Increased computation time
Use Case: Quality-focused, accuracy-critical searches

After comprehensive analysis, the re-ranking approach was selected as the optimal solution, justifying the slight performance trade-off with significantly improved search relevance and user satisfaction.

Company-Wide Security Framework

Following the successful completion of the MCP server project, I was tasked with developing a comprehensive prompt injection defense framework to protect the company's AI applications from sophisticated attacks.

Multi-LLM Architecture

Designed a sophisticated system featuring separate target, controller, and improver LLMs working in concert to create a complete test-and-fix cycle for prompt vulnerabilities.

Framework Features

Comprehensive Test Coverage: 63 pre-built rules across 7 distinct attack categories
Automated Improvement: System automatically suggests fixes for vulnerable prompts
Flexible Evaluation: Condition-based pass/fail criteria customizable for each test scenario
YAML Configuration: Easy-to-modify rule sets with customizable conditions
Multiple Output Formats: Both human-readable logs and machine-parseable JSON export
Scalable Deployment: Designed for company-wide rollout across multiple AI applications

Impact and Results

The framework achieved a remarkable 100% mitigation rate against injection attacks across five AI applications, providing the QA team with powerful automated vulnerability assessment capabilities. This success led to company-wide adoption and integration into the standard testing pipeline.

Team Presentation

During the 11th week of my internship, I presented the security framework to managers and the QA team, demonstrating its capabilities and training team members on proper usage and customization techniques.

AI WAVE Exhibition - G1 Humanoid Robot

Beyond software development, I gained invaluable experience as the primary technical operator for the G1 humanoid robot at the AI WAVE exhibition, bridging the gap between cutting-edge AI technology and public engagement.

Exhibition Highlights

Live Robot Demonstration

Demonstrating the G1 humanoid robot's capabilities during live interactive sessions with exhibition visitors.

Exhibition Gallery

AI WAVE Exhibition Setup — Marketing, BD, and R&D Team at the AI WAVE exhibition booth

Robot Interaction with Visitors — Facilitating interactive demonstrations between visitors and the G1 humanoid robot

Technical Operation of G1 Robot — Introducing AI chatting feature being put on the robot

Technical Responsibilities

Real-time technical troubleshooting and system monitoring during live demonstrations
Interactive presentations for diverse audiences including industry professionals and general public
Data collection for future reinforcement learning model development and user behavior analysis
Cross-team coordination with Marketing and BD teams for seamless exhibition presentations
User interaction analysis and comprehensive feedback documentation for product development
Technical setup and calibration of robot systems for optimal exhibition performance

Skills Developed

This role required rapid problem-solving under pressure, clear communication with non-technical audiences, and the ability to explain complex AI concepts in accessible terms. The experience provided crucial insights into the practical deployment challenges of AI systems in real-world environments, while developing skills in public demonstration, technical presentation, and cross-functional collaboration in high-stakes exhibition settings.

Performance Analysis & Data-Driven Decision Making

A critical aspect of the project involved designing and executing comprehensive testing frameworks to validate system performance and guide architectural decisions.

Testing Methodology

Created extensive Google Sheets with multiple tabs to compare the "One Tool" and "Three Tools" methodologies across five critical evaluation criteria:

Accuracy: Using F1 Score as the primary metric to balance Precision and Recall
Runtime Performance: End-to-end latency analysis and bottleneck identification
Single vs. Cross-Collection: Comparative effectiveness across search scopes
Cross-Collection Keyword Search: Specialized testing for multi-collection scenarios
Multilingual Capabilities: Validation across English, Japanese, and Traditional Chinese

Key Performance Insights

Runtime Analysis

Discovered that 95% of processing time originated from LLM generation rather than database operations, informing future optimization strategies.

Accuracy Comparison

"One Tool" approach achieved superior F1 Score of 77.8% compared to 76.4% for the "Three Tools" method.

Speed Optimization

Overall runtime improved from 9,357ms to 9,194ms with the optimized architecture.

Multilingual Validation

Conducted comprehensive testing using identical queries translated into English, Japanese, and Traditional Chinese, confirming consistent performance and relevant results across all supported languages. This validation was crucial for the international deployment of the system.

Future Enhancement Proposals

Based on performance analysis, I proposed implementing LangGraph with a verification agent architecture to critique and refine outputs from the generation agent, potentially reducing errors without significant time costs. Additionally, identified opportunities for implementing UV for better Python environment management across the organization.

Mentorship & Philosophical Growth

Beyond technical achievements, this internship provided profound personal and philosophical growth through mentorship from Vic Chung and Melody Wang, whose insights fundamentally shaped my understanding of engineering and problem-solving.

Engineering Philosophy

My mentors taught me that engineering transcends mere coding—it's fundamentally about problem-solving. As they eloquently explained, "coding is just 描述事情的邏輯，是將邏輯具象化的工具" (coding is merely describing logic and making logic concrete). This perspective shift helped me understand that finding the right people to solve problems is as valuable as solving them yourself.

Profound Conversations

Our discussions explored deep philosophical questions that continue to influence my thinking:

Nature of Life: What defines a living creature and consciousness
Adaptation: The principle of 適者生存 (survival of the fittest) in technology and life
Dimensions of Existence: The relationship between 物質與精神 (material and spiritual dimensions)
Multi-dimensional Thinking: Understanding problems across different conceptual dimensions
Spiritual Technology: The intersection of technological advancement and human spirit
Metaphysical Questions: Discussions about higher-order organizing principles

Transformative Impact

These conversations were very worth continuing to think about deeply. They fundamentally changed my approach to engineering challenges, encouraging me to think holistically about systems, their purpose, and their impact on human experience.

Lasting Legacy

This 12-week experience has been characterized as "the most memorial experience in my life," with lessons and perspectives that will be carried forward into all future engineering endeavors. The combination of technical excellence and philosophical depth created a uniquely enriching internship experience.

Skills Developed

AI & Machine Learning

RAG (Retrieval-Augmented Generation)
Vector Databases
LangGraph
Model Context Protocol (MCP)
Re-ranking Models
Semantic Search
LLM Integration

Database & Infrastructure

Milvus Vector Database
Docker Containerization
System Architecture
Performance Optimization
Cross-Collection Querying
Database Connection Management

Security & Testing

Prompt Injection Defense
Multi-LLM Architecture
Automated Security Testing
YAML Configuration
Vulnerability Assessment
Security Framework Design

Development & Analysis

Python
Performance Analysis
F1 Score Evaluation
Multilingual Testing
Data-Driven Decision Making
System Integration

Project Continuity & Future Improvements

During the final weeks of the internship, I enhanced the MCP server by migrating from stdio to streamable-http architecture, significantly improving calling efficiency and preparing the system for production-scale deployment. Additionally, I proposed using UV for enhanced Python environment management across the organization, demonstrating ongoing commitment to system optimization and developer experience improvements.

The experience culminated in presenting my work to the 16-person TDG team during the 8th week, sharing both technical achievements and lessons learned with colleagues who will continue building upon this foundation.

← Back to All Experiences