Hello, I am Toan(Thomas) Dao

I am a Data Scientist specializing in building ML models, MLOps pipelines, and data-driven solutions.

Contact Me

About Me

I’m Toan Dao (Thomas), a Senior Data Scientist with a strong background in both data science and software engineering. I have led teams at companies like Unilever and MoMo, optimizing forecasting accuracy and ad decision systems.

I excel in building scalable machine learning solutions, creating efficient MLOps pipelines, and driving impactful results through innovative projects.

Your Photo

Projects & Leadership

EfficientDet.Pytorch

Implemented EfficientDet from scratch in PyTorch for training, evaluation, and inference across diverse datasets. Collaborated with Shanghai Jiaotong University to compress 15% of parameters without accuracy loss.

View on GitHub

Time Series Demand Forecasting Model (Unilever)

Developed and deployed a time series demand forecasting model that increased forecast accuracy by 15%, streamlining supply chain and inventory processes.

MLOps Pipeline Implementation (Unilever)

Led the design and implementation of MLOps pipelines using Databricks and MLflow, automating model lifecycle management for over 10 pipelines.

Ad Decision Engine (MoMo)

Designed an Ad Decision engine utilizing LightGBM and PID controllers to optimize for 1200 RPS and 100ms latency, improving eCPM by 40% and CTR by 60%.

Real-time Ad Capping System (MoMo)

Built a real-time Ad Capping system using Pub/Sub and Redis to process over 30M events per day, preventing overspending and enhancing ad delivery efficiency.

Frequency Bought Together Recommendation (One Mount Group)

Developed a real-time Frequency Bought Together recommendation model that boosted the average order value by 20%.

Ensemble Graph Embedding & Sequence-Based Recommendation (One Mount Group)

Implemented an ensemble model for graph embedding and sequence-based recommendation that increased conversion rates by 35%.

Personalized Uplift Model (One Mount Group)

Created a personalized Uplift model using XGBoost that reduced cost-per-order by 60% and doubled customer transaction frequency.

RCNN-based OCR Model (Cinnamon)

Enhanced OCR accuracy by 17% through implementing RCNN-based models, achieving state-of-the-art performance.

End-to-End Text Detection & Recognition (Cinnamon)

Built an end-to-end model for text detection and recognition, reducing model parameters by 10% and optimizing costs by 25% on Google Cloud Platform.

Leaderboard Redesign (GameLoft)

Redesigned the leaderboard module using Java and Redis, improving system latency by 30%.

CI/CD Pipeline Automation for Dungeon Hunter (GameLoft)

Streamlined CI/CD pipelines by implementing Jenkins to automate build, test, and deployment processes.

Data Processing & REST API Integration (FPT Software)

Accelerated data processing by 5x using parallelization and implemented a REST API to aggregate over 20 system metrics.

Get in Touch

I’m always open to discussing new opportunities, interesting projects, or how to make the best data-driven decisions.

Send me an email

Or find me onGitHubLinkedIn