# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview This is a Python data comparison tool built with Python 3.13+. The project is currently in early development with a minimal structure containing: - A basic Python application entry point (`main.py`) - Sample data in Excel format (`data/sample-data.xlsx`) - Standard Python packaging configuration (`pyproject.toml`) ## Development Commands ### Running the Application ```bash uv run main.py ``` This launches a web-based GUI at http://localhost:8080 ### Running Analysis Only (Command Line) ```bash uv run data_comparator.py ``` ### Project Setup The project uses Python 3.13+ with uv for dependency management. Dependencies include: - pandas (Excel file processing) - openpyxl (Excel file reading) - flask (Web GUI) ## Project Structure - `main.py` - Main application entry point that launches the web GUI - `data_comparator.py` - Core comparison logic for KST vs Coordi data analysis - `web_gui.py` - Flask-based web GUI application - `analyze_excel.py` - Basic Excel file structure analysis utility - `data/` - Directory containing sample data files - `sample-data.xlsx` - Sample Excel data file for comparison operations - `templates/` - HTML templates for web GUI (auto-generated) - `pyproject.toml` - Python project configuration and metadata ## Key Features - **KST vs Coordi Comparison**: Compares data between KST columns (`Title KR`, `Epi.`) and Coordi columns (`KR title`, `Chap`) - **Mismatch Categorization**: Identifies KST-only, Coordi-only, and duplicate items - **Data Reconciliation**: Ensures matching counts after excluding mismatches - **Web-based GUI**: Interactive interface with tabs for different data views - **File Upload**: Upload Excel files directly through the web interface - **Sheet Filtering**: Filter results by specific Excel sheets - **Real-time Analysis**: Live comparison with detailed mismatch reasons ## Comparison Logic The tool compares Excel data by: 1. Finding columns by header names (not positions) 2. Extracting title+episode combinations from both datasets 3. Categorizing mismatches and calculating reconciliation 4. Displaying results with reasons for each discrepancy