Projects

Current builds and technical experiments. Open to feedback or a cleaner implementation.

Adversarial Attacks on Deeplearning Models

In this project, we explored the role of Adversarial Machine Learning in Natural Language Processing, focusing on how carefully crafted inputs can expose vulnerabilities in ML models. Our goal was twofold: to evaluate model robustness and to contribute toward building more secure NLP systems.

We implemented and tested both character-level attacks, such as flips, spacing tricks, and inner-letter shuffles, and word-level attacks, including transformer-based strategies like BERT Attack.

What made our work unique was its focus on the Arabic language, which remains underexplored in this domain. Arabic’s complex morphology and script presented unique challenges, making it an ideal testbed for adversarial techniques. Our findings help push forward the understanding of how NLP models can be better secured across diverse languages.

PyTorchNLPResearchAdversarial ML
Arabic Reverse Dictionary

A comparative NLP study that benchmarks five architectures on the task of mapping Arabic definitions to their correct words, given a description, predict the term. Built on a merged dataset of ~97,000 entries from multiple Arabic lexical sources, with a custom CAMeL Tools preprocessing pipeline handling diacritics, orthographic normalization, and lemmatization.

The project progresses through TF-IDF (18.2% Top-1), FastText + FAISS semantic search (15.0%), six fine-tuned Arabic Transformer models including CamelBERT and MARBERT (27.6% Top-1 after contrastive NT-Xent training), and finally generative LLMs with Retrieval-Augmented Generation using ChromaDB and multilingual-e5-base embeddings. Qwen3.5-4B with RAG achieved 39.8% Top-1 under morphological evaluation with no fine-tuning, running entirely on local Apple Silicon.

Evaluation goes beyond standard accuracy, a custom morphological normalization layer accounts for Arabic inflection, and additional metrics track output coverage, repetition rate, language consistency, and per-domain performance breakdown. Each approach is analyzed for why it succeeds or fails on Arabic specifically, with findings including why TF-IDF outperforms static embeddings on short glosses, how contrastive fine-tuning roughly doubles Transformer accuracy, and why generative models fundamentally change the OOV problem.

Arabic NLPPyTorchScikit-LearnHugging Face DatasetsTransformersFAISSChromaDBCAMeL ToolsMLXLLMsRAG
Archery Score Detection

Built an end-to-end computer vision system that automates archery scoring from a single photograph. The pipeline ingests a target face image, corrects for perspective distortion using HSV-based ellipse detection and affine warping, runs a fine-tuned YOLO26s object detection model to locate arrow tips and target geometry, then maps pixel-space distances to World Archery ring values through a custom geometric scoring engine. The system outputs per-arrow scores, a total, and automatic boundary flags for arrows near ring edges that require manual verification.

The model was trained on a curated dataset of 525 images built from 3,700 raw samples across four Roboflow repositories, with a custom preprocessing script to convert polygon segmentation labels into YOLO-compatible bounding boxes. Two models were benchmarked (YOLO26s and YOLOv12s) with YOLO26s selected for production based on superior arrow tip detection on the test set (0.673 vs 0.608 mAP50) and approximately 2x faster inference at 6.3ms per image. Overall test mAP50 reached 0.875 with a tight 3.8% generalization gap from validation.

PythonComputer VisionYOLORoboflowOpenCV
Note-ish

Note-ish is a fast, privacy-first productivity app built in Python for the terminal. It runs entirely offline and helps users manage tasks, habits, notes, journals, and calendar events from one simple interface.
Designed for minimalism and speed, it’s ideal for developers who prefer working in the command line without sacrificing functionality or privacy.

Python