June 25, 2025 - September 12, 2025

Software Engineering Intern

at Ubitus K.K. | Terminal Device Group (TDG)

Taiwan, Taipei

During my 12-week internship at Ubitus K.K., I joined the Terminal Device Group (TDG) working across web frontend, 3D, and CNF (Console and Frontier) teams. This transformative experience involved developing cutting-edge AI infrastructure solutions, from semantic search optimization to company-wide security frameworks, while gaining profound insights into engineering philosophy and problem-solving approaches.

Project Overview

My primary focus was developing a Model Context Protocol (MCP) server to enable UbiGPT, the company's large language model, to perform accurate and efficient semantic searches across multiple Milvus vector database collections. The project evolved through three distinct phases, each building upon the previous to create a comprehensive solution that achieved a 77.8% F1 Score.

Beyond the core MCP project, I architected a company-wide prompt injection defense framework and served as the primary technical operator for the G1 humanoid robot at the AI WAVE exhibition, providing a well-rounded experience in AI development, security, and real-world applications.

Phase 1: MCP Server Foundation

The journey began with inheriting and transforming a coworker's database search algorithm from milvus_memory.py into a robust, containerized MCP server. This foundational work established the groundwork for all subsequent enhancements.

System Architecture Development

Designed and built a Python-based MCP server to act as middleware between the UbiGPT LLM and the Milvus vector database. The architecture required careful consideration of connection stability, data formatting, and error handling to ensure reliable operation in a production environment.

Key Technical Achievements

  • Successfully containerized the existing database search script using Docker
  • Established stable connection checks and consistent aliasing for MilvusDB
  • Engineered data standardization, transforming returned document format from a list of dictionaries into structured JSON strings
  • Implemented robust error handling and logging mechanisms
  • Created comprehensive documentation for future maintenance and scaling

This phase laid the critical groundwork for single collection searches while designing the system with extensibility in mind for future cross-collection capabilities.

Phase 2: Cross-Collection Search Architecture

The most intellectually challenging phase involved enabling searches across multiple database collections. This required innovative thinking about how to balance LLM cognitive load, system performance, and search accuracy.

Three Architectural Approaches

Method A: Naive Approach (Discarded)

Initially considered searching all collections simultaneously but proactively discarded this approach after identifying significant performance bottlenecks and resource consumption issues.

Method B: "Three Tools" Approach

Developed a sophisticated system providing the LLM with three discrete tools: list_collections, list_schemas, and search. This enabled agentic discovery and step-by-step query processing, giving the LLM maximum flexibility in determining search strategies.

Method C: "One Tool" Approach (Selected)

Engineered a more efficient solution that abstracts database complexity by pre-generating comprehensive descriptions of all collections and schemas. This approach reduced cognitive load on the LLM while maintaining search effectiveness and improving overall system performance.

Implementation Details

The "One Tool" approach involved creating intelligent abstractions that combined collection metadata with schema information into easily digestible formats for the LLM. This required careful consideration of information density, relevance filtering, and presentation formatting to optimize LLM comprehension without overwhelming the context window.

Phase 3: Search Result Optimization

The final phase focused on dramatically improving search accuracy by transitioning from basic distance metrics to sophisticated semantic analysis techniques.

From L2 Distance to Re-ranking Models

Initially, the system relied on L2 (Euclidean) distance for filtering search results. While computationally efficient, this approach often missed semantically relevant results that didn't align perfectly in vector space. The transition to re-ranking models represented a significant leap in search quality.

Re-ranking Implementation

Leveraged advanced re-ranking models to analyze semantic relevance between raw user query text and retrieved documents. This approach considers contextual meaning, synonyms, and conceptual relationships that pure vector distance calculations might miss.

Trade-off Analysis

L2 Distance Method

  • Pros: Faster computation, lower latency
  • Cons: Less nuanced, may miss relevant results
  • Use Case: High-volume, speed-critical applications

Re-ranking Model

  • Pros: Higher precision, semantic understanding
  • Cons: Increased computation time
  • Use Case: Quality-focused, accuracy-critical searches

After comprehensive analysis, the re-ranking approach was selected as the optimal solution, justifying the slight performance trade-off with significantly improved search relevance and user satisfaction.

Company-Wide Security Framework

Following the successful completion of the MCP server project, I was tasked with developing a comprehensive prompt injection defense framework to protect the company's AI applications from sophisticated attacks.

Multi-LLM Architecture

Designed a sophisticated system featuring separate target, controller, and improver LLMs working in concert to create a complete test-and-fix cycle for prompt vulnerabilities.

Framework Features

  • Comprehensive Test Coverage: 63 pre-built rules across 7 distinct attack categories
  • Automated Improvement: System automatically suggests fixes for vulnerable prompts
  • Flexible Evaluation: Condition-based pass/fail criteria customizable for each test scenario
  • YAML Configuration: Easy-to-modify rule sets with customizable conditions
  • Multiple Output Formats: Both human-readable logs and machine-parseable JSON export
  • Scalable Deployment: Designed for company-wide rollout across multiple AI applications

Impact and Results

The framework achieved a remarkable 100% mitigation rate against injection attacks across five AI applications, providing the QA team with powerful automated vulnerability assessment capabilities. This success led to company-wide adoption and integration into the standard testing pipeline.

Team Presentation

During the 11th week of my internship, I presented the security framework to managers and the QA team, demonstrating its capabilities and training team members on proper usage and customization techniques.

AI WAVE Exhibition - G1 Humanoid Robot

Beyond software development, I gained invaluable experience as the primary technical operator for the G1 humanoid robot at the AI WAVE exhibition, bridging the gap between cutting-edge AI technology and public engagement.

Exhibition Highlights

Live Robot Demonstration

Demonstrating the G1 humanoid robot's capabilities during live interactive sessions with exhibition visitors.

Technical Responsibilities

  • Real-time technical troubleshooting and system monitoring during live demonstrations
  • Interactive presentations for diverse audiences including industry professionals and general public
  • Data collection for future reinforcement learning model development and user behavior analysis
  • Cross-team coordination with Marketing and BD teams for seamless exhibition presentations
  • User interaction analysis and comprehensive feedback documentation for product development
  • Technical setup and calibration of robot systems for optimal exhibition performance

Skills Developed

This role required rapid problem-solving under pressure, clear communication with non-technical audiences, and the ability to explain complex AI concepts in accessible terms. The experience provided crucial insights into the practical deployment challenges of AI systems in real-world environments, while developing skills in public demonstration, technical presentation, and cross-functional collaboration in high-stakes exhibition settings.

Performance Analysis & Data-Driven Decision Making

A critical aspect of the project involved designing and executing comprehensive testing frameworks to validate system performance and guide architectural decisions.

Testing Methodology

Created extensive Google Sheets with multiple tabs to compare the "One Tool" and "Three Tools" methodologies across five critical evaluation criteria:

  • Accuracy: Using F1 Score as the primary metric to balance Precision and Recall
  • Runtime Performance: End-to-end latency analysis and bottleneck identification
  • Single vs. Cross-Collection: Comparative effectiveness across search scopes
  • Cross-Collection Keyword Search: Specialized testing for multi-collection scenarios
  • Multilingual Capabilities: Validation across English, Japanese, and Traditional Chinese

Key Performance Insights

Runtime Analysis

Discovered that 95% of processing time originated from LLM generation rather than database operations, informing future optimization strategies.

Accuracy Comparison

"One Tool" approach achieved superior F1 Score of 77.8% compared to 76.4% for the "Three Tools" method.

Speed Optimization

Overall runtime improved from 9,357ms to 9,194ms with the optimized architecture.

Multilingual Validation

Conducted comprehensive testing using identical queries translated into English, Japanese, and Traditional Chinese, confirming consistent performance and relevant results across all supported languages. This validation was crucial for the international deployment of the system.

Future Enhancement Proposals

Based on performance analysis, I proposed implementing LangGraph with a verification agent architecture to critique and refine outputs from the generation agent, potentially reducing errors without significant time costs. Additionally, identified opportunities for implementing UV for better Python environment management across the organization.

Mentorship & Philosophical Growth

Beyond technical achievements, this internship provided profound personal and philosophical growth through mentorship from Vic Chung and Melody Wang, whose insights fundamentally shaped my understanding of engineering and problem-solving.

Engineering Philosophy

My mentors taught me that engineering transcends mere coding—it's fundamentally about problem-solving. As they eloquently explained, "coding is just 描述事情的邏輯,是將邏輯具象化的工具" (coding is merely describing logic and making logic concrete). This perspective shift helped me understand that finding the right people to solve problems is as valuable as solving them yourself.

Profound Conversations

Our discussions explored deep philosophical questions that continue to influence my thinking:

  • Nature of Life: What defines a living creature and consciousness
  • Adaptation: The principle of 適者生存 (survival of the fittest) in technology and life
  • Dimensions of Existence: The relationship between 物質與精神 (material and spiritual dimensions)
  • Multi-dimensional Thinking: Understanding problems across different conceptual dimensions
  • Spiritual Technology: The intersection of technological advancement and human spirit
  • Metaphysical Questions: Discussions about higher-order organizing principles

Transformative Impact

These conversations were very worth continuing to think about deeply. They fundamentally changed my approach to engineering challenges, encouraging me to think holistically about systems, their purpose, and their impact on human experience.

Lasting Legacy

This 12-week experience has been characterized as "the most memorial experience in my life," with lessons and perspectives that will be carried forward into all future engineering endeavors. The combination of technical excellence and philosophical depth created a uniquely enriching internship experience.

Skills Developed

AI & Machine Learning

  • RAG (Retrieval-Augmented Generation)
  • Vector Databases
  • LangGraph
  • Model Context Protocol (MCP)
  • Re-ranking Models
  • Semantic Search
  • LLM Integration

Database & Infrastructure

  • Milvus Vector Database
  • Docker Containerization
  • System Architecture
  • Performance Optimization
  • Cross-Collection Querying
  • Database Connection Management

Security & Testing

  • Prompt Injection Defense
  • Multi-LLM Architecture
  • Automated Security Testing
  • YAML Configuration
  • Vulnerability Assessment
  • Security Framework Design

Development & Analysis

  • Python
  • Performance Analysis
  • F1 Score Evaluation
  • Multilingual Testing
  • Data-Driven Decision Making
  • System Integration

Project Continuity & Future Improvements

During the final weeks of the internship, I enhanced the MCP server by migrating from stdio to streamable-http architecture, significantly improving calling efficiency and preparing the system for production-scale deployment. Additionally, I proposed using UV for enhanced Python environment management across the organization, demonstrating ongoing commitment to system optimization and developer experience improvements.

The experience culminated in presenting my work to the 16-person TDG team during the 8th week, sharing both technical achievements and lessons learned with colleagues who will continue building upon this foundation.

← Back to All Experiences