AI Voice Daily Report System for Foreign Workers
A SaaS that automatically transcribes, translates to Japanese, analyzes sentiment, and assesses risk from voice recordings by foreign workers in their native language, allowing managers to provide feedback in Japanese. Supports 16 languages.
Challenge
Traditional interview-based care had high time constraints and psychological barriers, making adequate care difficult while employee mental health issues were increasing.
Solution
Developed a mental health care app leveraging AI translation and sentiment analysis. Provided an environment where employees can easily seek support through emotion alert features.
Result
Supported early detection of mental health issues, reducing leave-of-absence and turnover rates
Team
1 member, 2 months
PM & full engineering
Role
Responsible for everything from requirements to design, implementation, and operations.
Partnered with the business side, leading PM and engineering from the validation stage.
Tech Stack
Key Features
6-stage AI pipeline: Voice recording → Whisper transcription → GPT/Gemini translation → Sentiment analysis → Risk assessment → Summary & follow-up question generation
16 language support: Japanese, English, Vietnamese, Indonesian, Myanmar, Nepali, Filipino, Thai, etc.
Multi-tenant RBAC: Tenant isolation per workspace (RLS), 4-level role management
Customizable question templates: Flexible daily report formats adaptable to industry and workplace
AI cost optimization: Model tiering (GPT-4o-mini/Gemini Flash → Gemini Pro) and fallback chains for optimal cost performance
Monthly partitioning: Partition high-frequency tables to prevent performance degradation at scale
Technical Highlights
Security Architecture Improvement
Changed from public API endpoint-based analysis to internal trigger approach using Supabase Edge Function + pg_cron, eliminating the DDoS attack surface.
AI API Cost Optimization
Used GPT-4o-mini/Gemini Flash (cheap) for translation, Gemini Pro only for complex reasoning. Built fallback chain from Google NLP API → OpenAI for unsupported languages in sentiment analysis.
Multilingual Voice Processing Challenges
Improved accuracy for 16-language transcription using Whisper language parameters and domain-specific prompts. Addressed Safari voice recording compatibility with custom WebAudio API hooks.