Applied AI Systems Engineer & ML Researcher

Architecting production-scale AI systems across GenAI, agentic architectures, computer vision, and document intelligence.

  • Autonomous AI agents
  • Enterprise GenAI systems
  • Computer vision pipelines
  • Document intelligence
4+
Years in AI
17+
Projects Delivered
3
Peer-Reviewed Papers
15+
Research Citations

About Me

Purushothaman Natarajan

4+

Years AI/ML

17+

AIML Projects

3

Publications

Applied AI Engineer & ML Researcher

Architecting production-scale AI systems across GenAI, agentic architectures, computer vision, and document intelligence. 4+ years delivering scalable ML platforms that transform complex data and workflows into reliable enterprise solutions.

Specialized in Large Language Models, RAG pipelines, multi-agent automation, multimodal AI, and explainable ML. Proven track record delivering scalable AI platforms that reduce manual effort, improve model reliability, and automate end-to-end workflows across vulnerability intelligence, document AI, and visual understanding systems.

Currently at Blackstraw AI, I architect enterprise AI solutions. Previous experience includes Amazon ML systems, DRDO-sponsored research programs, and multiple peer-reviewed publications. Strong focus on translating research-driven models into deployable, monitored, and high-impact production systems.

Professional Journey

Nov 2024 - Present

Data Scientist — Computer Vision & NLP

Blackstraw AI, Chennai

Production-grade Agentic AI, Computer Vision, and Document Intelligence Systems

  • LLM & Agentic AI: Architected production LLM and agentic AI systems automating enterprise workflows across document intelligence, vulnerability research, and product pipelines, reducing manual effort by 91%.
  • Search-Augmented LLM: Built search-augmented LLM frameworks integrating SERP and external knowledge for structured reasoning and automated attribute extraction.
  • Computer Vision Pipelines: Designed scalable CV and embedding-based pipelines for retail classification and similarity search supporting 200K+ product labels.
  • Advanced OCR Stack: Developed OCR and document intelligence stack (SegFormer, DeepLabV3+, ViT-STR, super-resolution), reducing manual document review by 85%.
  • Document Understanding: Built layout-aware document pipelines improving structured field extraction and reading-order consistency in complex documents.
  • PII Detection: Implemented automated PII detection and redaction using NLP and embedding similarity matching across documents and images.
  • Multi-Agent Workflows: Designed CrewAI-based automated codebase migration (Oracle->MySQL, Java->Python) with validation stages.
  • Client Engagement: Led demos and technical walkthroughs, translated business requirements into AI pipeline designs.
Python PyTorch Transformers LangChain LangGraph CrewAI AutoGen DSPy RAG Vector DB FAISS Milvus Pinecone FastAPI Docker Kubernetes YOLO SAM SegFormer DeepLabV3+ ViT-STR SQL PostgreSQL Neo4j
Sept 2023 - Oct 2024

Research Fellow — Explainable AI

SRM Institute of Science & Technology

DRDO-funded research on XAI for defense-grade sonar systems

  • DRDO Fellowship: Selected for DRDO-funded PhD research focused on explainable AI for defense sonar applications.
  • Sonar Detection: Developed deep learning models for underwater object detection under low-signal, noisy conditions achieving 94% accuracy.
  • Explainability Stack: Integrated LIME, SHAP, and Grad-CAM for interpretable predictions validated by NPOL and DRDO teams.
  • Self-Explainable Models: Delivered mission-critical AI systems tested for operational underwater surveillance.
  • Research Publications: 3 peer-reviewed papers on XAI, synthetic data generation, and sonar imagery analysis.
PyTorch YOLO LIME SHAP Grad-CAM Diffusion Models GAN OpenCV AWS
Jul 2022 - May 2023

Machine Learning Associate

Amazon, Chennai

ML systems for Alexa, Ring, and Halo products

  • Data Quality: Improved annotation and data quality pipelines using NLP preprocessing and normalization workflows.
  • Standardization: Standardized labeling and validation processes, reducing model noise and improving training consistency.
  • Monitoring: Built performance dashboards and QA metrics enabling faster feedback cycles and optimization.
Python NLP Data Pipeline AWS
Aug 2021 - Nov 2021

Customer Support Executive

Amazon, Coimbatore, IN

Support analytics and real-time dashboards for customer operations

  • Developed data cleaning and preprocessing pipelines using Python libraries.
  • Designed real-time prediction dashboards with Tableau, monitoring system health and performance metrics.

Education

PhD in Computer Science

Feb 2024 - Aug 2024

SRM University - Drop-out

Program discontinued after foundational research; continuing applied work externally.

Coursework

  • Computer Architecture
  • Artificial Intelligence
  • Comparison of Learning Algorithms
  • Computational Theory

M.Tech in Data Science

Sept 2022 - Sept 2024

BITS Pilani - CGPA: 8.38/10

Coursework

  • Data Science
  • Applied Machine Learning
  • Deep Learning
  • Natural Language Processing
  • Information Retrieval
  • Artificial and Computational Intelligence

B.E in Mechanical Engineering

Aug 2015 - Nov 2020

Anna University - CGPA: 6.50/10

Coursework

  • Design Thinking
  • Engineering Mechanics
  • Thermodynamics
  • Fluid Mechanics
  • Manufacturing Engineering

Featured Projects

Enterprise-grade AI systems in production

Enterprise

Autonomous Vulnerability Intelligence Platform

LLM agents ingest and reason over live CVE feeds; multi-agent triage, exploitability scoring, and automated remediation packaging with RAG + tool-calling orchestration. AI-powered platform that ingests CVE intelligence, researches exploitability, and autonomously generates validated remediation scripts. Multi-agent system performs triage, reasoning, and automated remediation packaging, validated in Kubernetes testbeds and deployed through enterprise pipelines like Intune.

Python LLM Agents AI Foundry RAG Vector DB Tool Calling Multi-Agent Orchestration Kubernetes Docker Dev-Ops FastAPI CVE Intelligence
Enterprise

Agentic Product Packaging Intelligence System

Search-augmented DSPy pipelines combine SERP retrieval with LLM reasoning to automate standards-based packaging code mapping and attribute extraction.

DSPy Python SERP API LLM Reasoning NLP Pipelines
Enterprise

Retail Vision Intelligence — Edge Detection & Classification

Edge-deployable detection + 200K+ class vector-search classification; embedding retrieval beats softmax and stays real-time on shelf cameras.

YOLO Image Embeddings Vector Search OpenCV PyTorch Edge Optimization
Enterprise

Logistics Asset Detection & Similarity Tracking

Detects pallets/boxes/forklifts and uses embedding similarity to match packages across warehouses; SimCLR features reduce mismatch and anomalies.

YOLO SimCLR CNNs Embedding Search OpenCV PyTorch
Enterprise

Enterprise Document Intelligence Platform — OCR Receipts

Production OCR pipeline with FFT-former deblurring and SwinIR/BSRGAN super-resolution; slashed SR latency 16s → 2–3s and boosted field accuracy.

OCR SwinIR BSRGAN FFT-former NLP Data Post-processing
Enterprise

Agentic Item Description Expansion

LangChain/LangGraph multi-agent flow expands terse product titles into rich, validated descriptions via retrieval, reasoning, and tool-calling.

LangChain LangGraph LLM Agents lmdb Python Prompt Engineering Retrieval Pipelines
Enterprise

Hierarchical LLM Decision Automation Engine

Converted 3600+ item coding playbooks into hierarchical LLM workflows with DSPy prompt optimization, rule validation, and noise-filtered image/text inputs.

DSPy Python LLMs Hierarchical Reasoning Rule-based Validation
Open Source

Doctane — Modular OCR & Document AI Framework

Open-source OCR/document-understanding stack with modular pipelines for extraction, preprocessing, and analysis.

Python Transformers PyTorch NLTK Streamlit
Code
Research

Explainable Remote Sensing Vision Framework

Transfer-learning scene classifier with LIME + Grad-CAM explainability for remote sensing imagery.

Python TensorFlow LIME Grad-CAM Gradio
Code
Side Project

Q&A Chatbot from PDF

Chatbot for answering queries from PDFs using BERT.

Python Transformers NLTK Gradio TensorFlow BERT
Code

Publications & Research

Peer-reviewed work in AI, XAI, and synthetic data

Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting

2024

Purushothaman Natarajan, Kamal Basha, Athira Nambiar

Dual diffusion models + GPT prompting for high-quality sonar image generation, enabling large-scale data augmentation for underwater object detection in defense applications.

Open Source models for generating synthetic Under-Water SONAR Images Image Generation

VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models

2024

Purushothaman Natarajan, Athira Nambiar

Integration of SHAP, Segment Anything Model, and Vision Language Models for human-understandable textual explanations of image classifier predictions across ImageNet and sonar imagery.

10+ Citations Under-Water SONAR Imagery

Underwater SONAR Image Classification and Analysis using LIME-based Explainable AI

2024

Purushothaman Natarajan, Athira Nambiar

Explainable AI techniques for underwater classification across Seabed Objects, on SONAR datasets with DRDO critical domain-grade validation.

DRDO Funded Open Source

Technical Expertise

Core AI Specialization

Generative AI
LLM Systems
Agentic AI
RAG Pipelines
Computer Vision
Multimodal AI
Production ML
XAI

LLMs & GenAI

GPT
Claude
LLaMA
Mistral
Prompt Engineering
Fine-tuning
LoRA
Tool Calling

Agent Frameworks

DSPy
LangChain
LangGraph
AutoGen
CrewAI
Multi-Agent Orchestration
A2A Communication
OpenAI SDK

ML & Deep Learning

Transformers
CNNs
RNNs
Diffusion Models
GANs
Autoencoders
Reinforcement Learning
Representation Learning

Computer Vision & OCR

YOLO
SAM
SegFormer
DeepLabV3+
ViT-STR
Image Classification
Object Detection
Document AI

Data & Vector Search

RAG Pipelines
FAISS
Pinecone
Milvus
PostgreSQL
Neo4j
Embeddings
Semantic Search

Security & Risk

CVE Analysis
Threat Intelligence
Risk Scoring
PII Detection
NLP Security
Secure Deployment
Privacy-Preserving ML
Guardrails

Cloud & MLOps

Azure
GCP
AWS
Docker
Kubernetes
FastAPI
CI/CD
Model Deployment

Programming Languages

Python
SQL
C++ (Basics)
JavaScript
Bash
Git
Linux
CUDA

Certifications & Awards

Udacity

  • Deep Learning Nanodegree
  • Computer Vision Nanodegree
  • Generative AI Nanodegree

DeepLearning.AI

  • Retrieval Augmented Generation (RAG)
  • Reliable AI using Guardrails

LinkedIn Learning

  • Advanced SQL
  • AI Foundations: ANN, CNN, RNN, LSTM, GNN, Transformers
  • Advanced NLP with Python
  • GANs and Diffusion Models
  • Transfer Learning Using PyTorch
  • Deep Learning for Computer Vision

Awards & Recognition

  • Third Prize — Innovation & Design on Remote Sensing Data (Hack2Skill)
  • Top Performer — Amazon (2 consecutive months)
  • Instructor — BrightNext Academy (100+ students, 2023-2024)
  • Freelancer — Delivered $10K-$20K+ projects independently

Additional Experience & Awards

Business Analyst, IIFL (Oct 2019 – May 2021)

Managed daily client transactions worth $10M, liaised between clients and the trading desk, and provided strategic market insights.

Instructor, BrightNext Academy (2023-2024)

Taught Machine Learning and Deep Learning courses to over 100 students.

Freelancer (Upwork and LinkedIn) (2022-2024)

Successfully delivered AI and ML projects valued between $10K and $20K as an independent contributor.

Third Prize, Innovation & Design on Remote Sensing Data (Hack2Skill)

Recognized for designing a synchronized research and production-ready dashboard for an explainable image classifier, competing against 100+ entries.

Let's Build Something Great

Open for collaborations, consulting, and impactful AI projects