data-comparison/CLAUDE.md

59 lines
2.2 KiB
Markdown
Raw Normal View History

2025-08-20 07:03:31 +00:00
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a Python data comparison tool built with Python 3.13+. The project is currently in early development with a minimal structure containing:
- A basic Python application entry point (`main.py`)
- Sample data in Excel format (`data/sample-data.xlsx`)
- Standard Python packaging configuration (`pyproject.toml`)
## Development Commands
### Running the Application
```bash
uv run main.py
```
This launches a web-based GUI at http://localhost:8080
### Running Analysis Only (Command Line)
```bash
uv run data_comparator.py
```
### Project Setup
The project uses Python 3.13+ with uv for dependency management. Dependencies include:
- pandas (Excel file processing)
- openpyxl (Excel file reading)
- flask (Web GUI)
## Project Structure
- `main.py` - Main application entry point that launches the web GUI
- `data_comparator.py` - Core comparison logic for KST vs Coordi data analysis
- `web_gui.py` - Flask-based web GUI application
- `analyze_excel.py` - Basic Excel file structure analysis utility
- `data/` - Directory containing sample data files
- `sample-data.xlsx` - Sample Excel data file for comparison operations
- `templates/` - HTML templates for web GUI (auto-generated)
- `pyproject.toml` - Python project configuration and metadata
## Key Features
- **KST vs Coordi Comparison**: Compares data between KST columns (`Title KR`, `Epi.`) and Coordi columns (`KR title`, `Chap`)
- **Mismatch Categorization**: Identifies KST-only, Coordi-only, and duplicate items
- **Data Reconciliation**: Ensures matching counts after excluding mismatches
- **Web-based GUI**: Interactive interface with tabs for different data views
- **File Upload**: Upload Excel files directly through the web interface
- **Sheet Filtering**: Filter results by specific Excel sheets
- **Real-time Analysis**: Live comparison with detailed mismatch reasons
## Comparison Logic
The tool compares Excel data by:
1. Finding columns by header names (not positions)
2. Extracting title+episode combinations from both datasets
3. Categorizing mismatches and calculating reconciliation
4. Displaying results with reasons for each discrepancy