I am a Data Scientist specializing in building ML models, MLOps pipelines, and data-driven solutions.
Contact MeI’m Toan Dao (Thomas), a Senior Data Scientist with a strong background in both data science and software engineering. I have led teams at companies like Unilever and MoMo, optimizing forecasting accuracy and ad decision systems.
I excel in building scalable machine learning solutions, creating efficient MLOps pipelines, and driving impactful results through innovative projects.
Implemented EfficientDet from scratch in PyTorch for training, evaluation, and inference across diverse datasets. Collaborated with Shanghai Jiaotong University to compress 15% of parameters without accuracy loss.
View on GitHubDeveloped and deployed a time series demand forecasting model that increased forecast accuracy by 15%, streamlining supply chain and inventory processes.
Led the design and implementation of MLOps pipelines using Databricks and MLflow, automating model lifecycle management for over 10 pipelines.
Designed an Ad Decision engine utilizing LightGBM and PID controllers to optimize for 1200 RPS and 100ms latency, improving eCPM by 40% and CTR by 60%.
Built a real-time Ad Capping system using Pub/Sub and Redis to process over 30M events per day, preventing overspending and enhancing ad delivery efficiency.
Developed a real-time Frequency Bought Together recommendation model that boosted the average order value by 20%.
Implemented an ensemble model for graph embedding and sequence-based recommendation that increased conversion rates by 35%.
Created a personalized Uplift model using XGBoost that reduced cost-per-order by 60% and doubled customer transaction frequency.
Enhanced OCR accuracy by 17% through implementing RCNN-based models, achieving state-of-the-art performance.
Built an end-to-end model for text detection and recognition, reducing model parameters by 10% and optimizing costs by 25% on Google Cloud Platform.
Redesigned the leaderboard module using Java and Redis, improving system latency by 30%.
Streamlined CI/CD pipelines by implementing Jenkins to automate build, test, and deployment processes.
Accelerated data processing by 5x using parallelization and implemented a REST API to aggregate over 20 system metrics.
I’m always open to discussing new opportunities, interesting projects, or how to make the best data-driven decisions.
Send me an email