Safaeid Hossain Arib

Learner, Researcher, Problem Solver

Hello, I am Safaeid Hossain Arib.

I am currently working as a ML Engineer at ACI Limited, where I am working on different projects. Previously I worked as an AI Engineer at Ontik Technology, where I constantly explored practical ways to bring machine learning solutions to life. You can check out some of my recent work in the Projects section. Outside of work, I’m diving deep into projects focused on long-term time series forecasting, vision language models, amodal counting, and more.

I hold a B.Sc. in Robotics and Mechatronics Engineering from the University of Dhaka. During my undergraduate studies, I worked as a research assistant with Dr. Shafin Rahman and Dr. Sejuti Rahman, where I developed a continuous sign language translation system that achieved competitive performance.

When I’m not coding or reading research papers, you’ll likely find me on the basketball court, lifting at the gym, or off exploring new places. I love connecting with curious minds—feel free to reach out!

I am currently actively looking for opportunities to pursue a PhD starting in Fall 2026.

Research Interest: Multimodal Learning, Multimodal Grounding, Multimodal Understanding and Reasoning, Embodied AI.

Connect with me!

News and Updates

September 22, 2025:
Our work "T3Time" has been accepted at the AAAI 2026 Main Technical Track.!!! 🚀

September 22, 2025:
Thrilled to share that our work "SignFormer-GCN" has been accepted at the prestigious WiML Workshop @ NeurIPS 2025! 🚀

September 01, 2025:
Excited to begin my journey as a Machine Learning Engineer at ACI Limited! 💻🤖

February 14, 2025:
Our paper "SignFormer-GCN" is officially published in PLOS ONE! 🎉

March 14, 2024:
Officially graduated from the Department of Robotics and Mechatronics Engineering, University of Dhaka. Grateful for an amazing 4-year journey! 🥳

January 24, 2024:
Successfully defended my Undergraduate Thesis — a huge milestone in my academic journey! 🎓

February 01, 2024:
Proud to announce that I passed CFA Level 1 on my very first attempt! 📈✅

January 28, 2024:
Kicked off my career in the tech industry by joining Ontik Technology as an AI Engineer! 🤝

September 25, 2021:
Honored to virtually present my poster "Classical machine learning approach for human activity recognition using location data" at Proc. UbiComp/ISWC 2021 🌍✨

Research

I'm interested in reinforcement learning, computer vision, natural language processing, machine learning, multimodal learning, human-robot interaction and robotics.

Completed

	Counting Through Occlusion: Framework for Open World Amodal Counting Safaeid Hossain Arib, Rabeya Akter, Abdul Monaf Chowdhury, Jubair Ahmed Sourov, Md Mehedi Hasan Under Review at CVPR 2026 [arxiv] [code] Abstract: Object counting has achieved remarkable success on visible instances, yet state-of-the-art (SOTA) methods fail under occlusion, a pervasive challenge in real-world deployment. This failure stems from a fundamental architectural limitation where backbone networks encode occluding surfaces rather than target objects, thereby corrupting the feature representations required for accurate enumeration. To address this, we present CountOCC, an amodal counting framework that explicitly reconstructs occluded object features through hierarchical multimodal guidance. Rather than accepting degraded encodings, we synthesize complete representations by integrating spatial context from visible fragments with semantic priors from text and visual embeddings, generating class-discriminative features at occluded locations across multiple pyramid levels. We further introduce a visual equivalence objective that enforces consistency in attention space, ensuring that both occluded and unoccluded views of the same scene produce spatially aligned gradient-based attention maps. Together, these complementary mechanisms preserve discriminative properties essential for accurate counting under occlusion. For rigorous evaluation, we establish occlusion-augmented versions of FSC and CARPK spanning both structured and unstructured scenes. CountOCC achieves SOTA performance on FSC 147 with 26.72% and 20.80% MAE reduction over prior baselines under occlusion in validation and test, respectively. CountOCC also demonstrates exceptional generalization by setting new SOTA results on CARPK with 49.89% MAE reduction and on CAPTURE Real with 28.79%MAE reduction, validating robust amodal counting across diverse visual domains. Code will be released soon.
	LAGEA: LANGUAGE GUIDED EMBODIED AGENTS FOR ROBOTIC MANIPULATION Abdul Monaf Chowdhury, Akm Moshiur Rahman Mazumder, Rabeya Akter, Safaeid Hossain Arib Under Review at ICLR 2026 [arxiv] [code] Abstract: Robotic manipulation benefits from foundation models that describe goals, but today’s agents still lack a principled way to learn from their own mistakes. We ask whether natural language can serve as feedback, an error-reasoning signal that helps embodied agents diagnose what went wrong and correct course. We introduce LAGEA (Language Guided Embodied Agents), a framework that turns episodic, schema-constrained reflections from a vision language model (VLM) into temporally grounded guidance for reinforcement learning. LAGEA summarizes each attempt in concise language, localizes the decisive moments in the trajectory, aligns feedback with visual state in a shared representation, and converts goal progress and feedback agreement into bounded, step-wise shaping rewards whose influence is modulated by an adaptive, failure-aware coefficient. This design yields dense signals early when exploration needs direction and gracefully recedes as competence grows. On the Meta-World MT10 embodied manipulation benchmark, LAGEA improves average success over the state-of-the-art (SOTA) methods by 9.0% on random goals and 5.3% on fixed goals, while converging faster. These results support our hypothesis: language, when structured and grounded in time, is an effective mechanism for teaching robots to self-reflect on mistakes and make better choices. Code will be released soon.
	TriFormer: A Tri-Modal Multi-Head Cross-Attentive Framework for Enhanced Time Series Forecasting Abdul Monaf Chowdhury, Rabeya Akter, Safaeid Hossain Arib Accepted at AAAI 2026 [arxiv] [code] Abstract: Multivariate time series forecasting (MTSF) seeks to model temporal dynamics among variables to predict future trends. Transformer-based models and large language models (LLMs) have shown promise due to their ability to capture long-range dependencies and patterns. However, current methods often rely on rigid inductive biases, ignore intervariable interactions, or apply static fusion strategies that limit adaptability across forecast horizons. These limitations create bottlenecks in capturing nuanced, horizon-specific relationships in time-series data. To solve this problem, we propose T3Time, a novel trimodal framework consisting of time, spectral, and prompt branches, where the dedicated frequency encoding branch captures the periodic structures along with a gating mechanism that learns prioritization between temporal and spectral features based on the prediction horizon. We also proposed a mechanism which adaptively aggregates multiple cross-modal alignment heads by dynamically weighting the importance of each head based on the features. Extensive experiments on benchmark datasets demonstrate that our model consistently outperforms state-of-the-art baselines, achieving an average reduction of 3.37% in MSE and 2.08% in MAE. Furthermore, it shows strong generalization in few-shot learning settings: with 5% training data, we see a reduction in MSE and MAE by 4.13% and 1.91%, respectively; and with 10% data, by 3.70% and 1.98% on average.
	SignFormer-GCN : Continuous Sign Language Translation Using Spatio-Temporal Graph Convolutional Networks Safaeid Hossain Arib, Rabeya Akter, Sejuti Rahman, Shafin Rahman PLOS One [paper] [code] Abstract: Sign language is a complex visual language system that uses hand gestures, facial expressions, and body movements to convey meaning. It is the primary means of communication for millions of deaf and hard-of-hearing individuals worldwide. Tracking physical actions, such as hand movements and arm orientation, alongside expressive actions, including facial expressions, mouth movements, eye movements, eyebrow gestures, head movements, and body postures, using only RGB features can be limiting due to discrepancies in backgrounds and signers across different datasets. Despite this limitation, most Sign Language Translation (SLT) research relies solely on RGB features. We used keypoint features, and RGB features to capture better the pose and configuration of body parts involved in sign language actions and complement the RGB features. Similarly, most works on SLT research have used transformers, which are good at capturing broader, high-level context and focusing on the most relevant video frames. Still, the inherent graph structure associated with sign language is neglected and fails to capture low-level details. To solve this, we used a joint encoding technique using a transformer and STGCN architecture to capture the context of sign language expressions and spatial and temporal dependencies on skeleton graphs. Our method, SignFormer-GCN, achieves competitive performance in RWTH-PHOENIX-2014T, How2Sign, and BornilDB v1.0 datasets experimentally, showcasing its effectiveness in enhancing translation accuracy through different sign languages. The code is available at the following link: https://github.com/rabeya-akter/SignLanguageTranslation.
	Classical machine learning approach for human activity recognition using location data Safaeid Hossain Arib, Rabeya Akter, Sejuti Rahman, Shafin Rahman UbiComp/ISWC '21 Adjunct: Adjunct Proceedings of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2021 ACM International Symposium on Wearable Computers [paper] [code] Abstract: The Sussex-Huawei Locomotion-Transportation (SHL) recognition Challenge 2021 was a competition to classify 8 different activities and modes of locomotion performed by three individual users. There were four different modalities of data (Location, GPS, WiFi, and Cells) which were recorded from the phones of the users in their hip position. The train set came from user-1 and the validation set and test set were from user-2 and user-3. Our team ’GPU Kaj Kore Na’ used only location modality to give our predictions in test set of this year’s competition as location data was giving more accurate predictions and the rest of the modalities were too noisy as well as not contributing much to increase the accuracy. In our method, we used statistical feature set for feature extraction and Random Forest classifier to give prediction. We got validation accuracy of 78.138% and a weighted F1 score of 78.28% on the SHL Validation Set 2021.
	Bornil : An open-source sign language data crowdsourcing platform for AI enabled dialect-agnostic communication Shahriar Elahi Dhruvo, Mohammad Akhlaqur Rahman, Manash Kumar Mandal, Md Istiak Hossain Shihab, A. A. Ansary, Kaneez Fatema Shithi, Sanjida Khanom, Rabeya Akter, Safaeid Hossain Arib, M.N. Ansary, Sazia Mehnaz, Rezwana Sultana, Sejuti Rahman, Sayma Sultana Chowdhury, Sabbir Ahmed Chowdhury, Farig Sadeque, Asif Sushmit [arxiv] Abstract: The absence of annotated sign language datasets has hindered the development of sign language recognition and translation technologies. In this paper, we introduce Bornil; a crowdsource-friendly, multilingual sign language data collection, annotation, and validation platform. Bornil allows users to record sign language gestures and lets annotators per form sentence and gloss-level annotation. It also allows validators to make sure of the quality of both the recorded videos and the annotations through manual validation to develop high-quality datasets for deep learning based Automatic Sign Language Recognition. To demonstrate the system’s efficacy; we collected the largest sign language dataset for Bangladeshi Sign Language dialect, perform deep learning based Sign Language Recognition modeling, and report the benchmark performance. The Bornil platform, BornilDB v1.0 Dataset, and the codebases are available on https://bornil.bengali.ai.

Projects

Included projects completed through professional work, academic coursework, and independent personal initiatives.

Office Projects

project image

Pivotal Conversations

Built a business process automation system that analyzes historical social media data for content creators, generating monthly insight-driven reports with personalized recommendations and visualizations based on statistical analysis. This agentic workflow significantly boosted client viewership, streamlined reporting, eliminated manual effort, and enhanced decision-making.
Tools Used: pytorch, supermetrics, airtable, crewai

project image

LookupMed

Developed a tool enabling patients to scan prescriptions, schedule medication dosages, and find alternative medicines. Leveraged an OCR and post‑processing pipeline to extract and validate drug names from prescription scans—ensuring alignment with reference data and achieving high accuracy. Collaborated with Dr. Javed Mostafa, Dean of the Faculty of Information at the University of Toronto, to pilot the system and validate its clinical and technical workflows.

project image

LEGACY UNLOCKED

Designed and implemented an agentic workflow to extract and validate data from binary COBOL files into structured formats (CSV/TSV), ensuring consistency with copybook definitions. Successfully delivered demos to major consultancy firms like Oliver Wyman, showcasing how AI and agentic workflows can modernize legacy systems by unlocking data, enabling migration, generating insights, and supporting natural language interaction with enterprise data.
Tools Used: python, pandas, numpy, crewai

Course Projects

project image

Automated Stock Trading Using Approximate Q Learning

RME 3211 (Intelligent Systems and Robotics Lab)
[code]

Developed an automated stock trading system using approximate Q-learning to optimize buy, sell, and hold decisions for three Bangladeshi stocks, achieving an 11% return on investment and demonstrating the effectiveness of reinforcement learning in financial decision-making.

project image

A Comprehensive Comparative Analysis of Emotional Support Delivery by NAO Robots and Humans Across Varied Emotional States

RME 4211 (Human Robot Interaction Lab)
[code]

Conducted a comparative analysis of emotional support delivery by NAO robot and humans across 3 emotional states, developing a system and questionnaire to quantify emotional impact, comfort levels, communication clarity, and overall satisfaction, with the goal of assessing the efficacy of robotic versus human emotional support and identifying areas for improvement.

Personal Projects

project image

Project HeatAlert: Predicting Heatwaves in Dhaka

This project aims to develop a machine learning-based system to forecast heatwave events in Dhaka city using historical weather data. By analyzing patterns in temperature, humidity, and other meteorological indicators, the model provides early warnings to help mitigate the impact of extreme heat on public health and urban infrastructure.

Education

Below is a concise summary of my academic background to date.

■ CFA Level 1 [2024]
CFA Institute
- Status: Passed

- Notable Courses: Economics, Corporate Finance, Portfolio Management, Financial Reporting Analysis, Quantitative Analysis, Fixed income Analysis, Professional Ethics, Equity Investment, Derivatives

■ Bachelor of Science in Robotics & Mechatronics Engineering [2019–2024]
Department of Robotics & Mechatronics Engineering,
University of Dhaka,
Dhaka, Bangladesh
- CGPA: 3.67/4.00 (Final Year: 3.77/4.00)

-Final Year Project: Continuous Bangla Sign Language Translation: Mitigating the Expense of Gloss Annotation with the Assistance of Graph [Presentation]

- Notable Courses: Artificial Intelligence, Introduction to Machine Learning, Digital Image Processing and Robot Vision, Digital Signal Processing, Introduction to Robotics, Advanced Robotics, Intelligent Systems and Robotics, Human Robot Interaction, Object Oriented Programming, Fundamentals of Programming, Fundamentals of Computing, Microcontroller and Programmable Logic Controller, Digital Logic Circuit and Microprocessor, Control System Design, Linear Algebra, Differential and Integral Calculus, Multivariate and Vector Calculus, Differential Equations and Coordinate Geometry, Mathematical Analysis for Engineers, Statistics for Engineers, Fundamentals of Electrical and Electronics Engineering, Fundamentals of Mechanical Engineering, Fundamentals of Mechatronics Engineering, Advanced Mechatronics Engineering, Industrial Management

■ Higher Secondary School Certificate (HSC) [2016–2018]
Notre Dame College,
Dhaka, Bangladesh
- GPA: 5.00/5.00

■ Secondary School Certificate (SSC) [2008–2016]
Dhaka Residential Model College,
Dhaka, Bangladesh
- GPA: 5.00/5.00

Work Experience

I am committed to pursuing a research-driven career, whether in academia or industry. Below are some highlights of my professional experiences so far.

■ ML Engineer [Sep 2025 – Present]
ACI Limited
Team: Machine Learning
◦ Developing an automated call center bot using an agentic approach to handle customer interactions, with integrated Bangla Automatic Speech Recognition and Text-to-Speech for enhanced user experience. ◦ Designing a video-based truck identification system through license plate recognition, addressing challenges of low-resolution imagery to improve detection and OCR accuracy.

■ AI Engineer [Feb 2024 – July 2025]
Ontik Technology
Team: Machine Learning
◦ Built a business process automation system that analyzes historical social media data for content creators to generate monthly insight-driven reports (PDF/PowerPoint) with personalized recommendations and visualizations based on statistical analysis using agentic workflow. This solution significantly boosted client viewership and streamlined reporting, eliminating manual effort and enhancing decision-making.
◦ Designed and implemented an agentic workflow to extract and validate data from binary COBOL files into structured formats (CSV/TSV), ensuring consistency with copybook definitions. Successfully delivered demos to major consultancy firms like Oliver Wyman in modernizing legacy systems by enabling AI-powered data insights and accelerating their digital transformation.
◦ Created Power BI dashboard to enable the Business and Finance teams to track organizational progress and help in making data driven decisions.
◦ Designed prototype for Retrieval Augmented Generation Chatbot for automating customer support.
◦ Led the development of Project Ongaurd-M, serving both as a machine learning engineer and the point of contact (PoC), creating a tool to help patients scan prescriptions, schedule medication dosages, and find alternative medicines.
◦ Conducted financial planning and analysis for LazyChat, spanning competitor analysis, go-to-market strategy, value proposition, market positioning, and pricing strategy development.

■ Undergraduate Research Assistant [Jan 2023 – Jan 2024]
Department of Robotics and Mechatronics Engineering, University of Dhaka
Supervisor: Dr. Sejuti Rahman
◦ Designed and implemented a novel architecture for BdSL translation, establishing the first benchmark in this low-resource language domain.
◦ Authored a manuscript detailing innovative methodology and findings, published in PLOS One (Q1 Journal).

Awards and Scholarships

Over the years, I’ve been honored with a few recognitions—made possible by the invaluable support of mentors and peers who have greatly influenced my growth.

■ Research Grant

Special Innovation Fund, Bangladesh Ministry of Science and Technology (MoST), Fiscal Year 2023–2024
Project ID: SRG-232431

■ Hackathons

Honourable Mention, NASA Earth Observatory (EO) Dashboard Hackathon 2021
Organized by NASA, ESA, and JAXA
11th Place, Sussex-Huawei Locomotion Challenge 2021
Organized by University of Sussex and Huawei

Leadership

I dedicate part of my time to serving communities. Here are a few selected contributions.

■ Student Activity Secretary [Jan 2023 – Dec 2023]
IEEE Robotics and Automation Society Student Branch, University of Dhaka
◦ Organized several workshops, interactive sessions, and industry expert experience-sharing sessions.

■Mentor [Jul 2019 – Dec 2020]
Bangladesh Robot Olympiad (BdRO)
◦ Conducted robotics workshops in multiple high schools and developed engaging quizzes for BdRO.

■ Mentor [Jun 2024 – Sep 2024]
Bangladesh Artificial Intelligence Olympiad (BdAIO)
◦ Designed and delivered lessons on classification algorithms and evaluation metrics for a group of 10 high school students, and created quizzes to assess their understanding.