# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a Python data comparison tool built with Python 3.13+. The project is currently in early development with a minimal structure containing:

- A basic Python application entry point (`main.py`)
- Sample data in Excel format (`data/sample-data.xlsx`)
- Standard Python packaging configuration (`pyproject.toml`)

## Development Commands

### Running the Application
```bash
uv run main.py
```
This launches a web-based GUI at http://localhost:8080

### Running Analysis Only (Command Line)
```bash
uv run data_comparator.py
```

### Project Setup
The project uses Python 3.13+ with uv for dependency management. Dependencies include:
- pandas (Excel file processing)
- openpyxl (Excel file reading)
- flask (Web GUI)

## Project Structure

- `main.py` - Main application entry point that launches the web GUI
- `data_comparator.py` - Core comparison logic for KST vs Coordi data analysis
- `web_gui.py` - Flask-based web GUI application
- `analyze_excel.py` - Basic Excel file structure analysis utility
- `data/` - Directory containing sample data files
  - `sample-data.xlsx` - Sample Excel data file for comparison operations
- `templates/` - HTML templates for web GUI (auto-generated)
- `pyproject.toml` - Python project configuration and metadata

## Key Features

- **KST vs Coordi Comparison**: Compares data between KST columns (`Title KR`, `Epi.`) and Coordi columns (`KR title`, `Chap`)
- **Mismatch Categorization**: Identifies KST-only, Coordi-only, and duplicate items
- **Data Reconciliation**: Ensures matching counts after excluding mismatches
- **Web-based GUI**: Interactive interface with tabs for different data views
- **File Upload**: Upload Excel files directly through the web interface
- **Sheet Filtering**: Filter results by specific Excel sheets
- **Real-time Analysis**: Live comparison with detailed mismatch reasons

## Comparison Logic

The tool compares Excel data by:
1. Finding columns by header names (not positions)
2. Extracting title+episode combinations from both datasets
3. Categorizing mismatches and calculating reconciliation
4. Displaying results with reasons for each discrepancy