Translation Model Benchmark for Multilingual Video Transcripts
A multilingual benchmark comparing Google Translate, DeepL, and Llama Maverick 4 on noisy video transcript data across 15 languages.
nikolasbakalis.com
Personal work, research, and technical writing
A focused archive for backend systems work, data infrastructure benchmarks, ML/NLP research, and technical writing that is useful as context for collaborations, applications, and deeper engineering conversations.
Selected research
A multilingual benchmark comparing Google Translate, DeepL, and Llama Maverick 4 on noisy video transcript data across 15 languages.
A 20 million row benchmark comparing RDS/Postgres serving tables with StarRocks OLAP tables and async materialized views.
An API-level comparison of denormalized RDS tables and StarRocks async materialized views across 100k, 1m, and 10m row scales.
A training report for an in-house hierarchical IAB 3.0 content classifier built to replace external classification dependencies.
A benchmark of API frameworks and datastore access patterns for data-intensive services spanning PostgreSQL, StarRocks, and OpenSearch.
A project proposal for replacing external classification APIs with in-house IAB 3.0 classification and language detection pipelines at media-corpus scale.
Tools
The stack changes by project, but the recurring work is around reliable data paths, measurable model behavior, and production API surfaces.
Research
A multilingual benchmark comparing Google Translate, DeepL, and Llama Maverick 4 on noisy video transcript data across 15 languages.
New York, NY
I work on data-intensive backend systems, evaluation workflows, and ML/NLP pipelines where correctness, performance, and operational tradeoffs matter. This site collects dated reports and implementation notes so the work can be reviewed directly instead of reduced to a resume bullet.