Browser Automation AI Agent

AI
Backend

Tech Stack

Typescript
Node.js
LLM
AI Agents
Google Gemini
MCP
Playwright

Description

An AI Agent that automates and executes a workflow/process based on natural language instructions using a web browser and also captures screenshots of the UI states of the workflow. The system uses an AI Agent architecture to understand user intents and perform browser automation tasks using Playwright MCP Server and captures/screenshots the UI states.

  • Natural Language Processing: Understands user queries in plain English
  • Automated Web Interactions: Performs actions like clicking, typing, and navigating
  • Screenshot Capture: Takes screenshots of relevant UI elements or pages
  • Human-in-the-Loop: Optional approval before executing automation steps
  • Integration with AI Models: Uses Google's Generative AI for processing queries

Screenshots & Video Demo

/projects/browser-automation-ai-agent/ai-agent-architecture.png