2025-08-20 07:03:31 +00:00
|
|
|
# CLAUDE.md
|
|
|
|
|
|
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
|
|
|
|
|
|
## Project Overview
|
|
|
|
|
|
|
|
|
|
This is a Python data comparison tool built with Python 3.13+. The project is currently in early development with a minimal structure containing:
|
|
|
|
|
|
|
|
|
|
- A basic Python application entry point (`main.py`)
|
|
|
|
|
- Sample data in Excel format (`data/sample-data.xlsx`)
|
|
|
|
|
- Standard Python packaging configuration (`pyproject.toml`)
|
|
|
|
|
|
|
|
|
|
## Development Commands
|
|
|
|
|
|
|
|
|
|
### Running the Application
|
|
|
|
|
```bash
|
|
|
|
|
uv run main.py
|
|
|
|
|
```
|
|
|
|
|
This launches a web-based GUI at http://localhost:8080
|
|
|
|
|
|
|
|
|
|
### Running Analysis Only (Command Line)
|
|
|
|
|
```bash
|
|
|
|
|
uv run data_comparator.py
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Project Setup
|
|
|
|
|
The project uses Python 3.13+ with uv for dependency management. Dependencies include:
|
|
|
|
|
- pandas (Excel file processing)
|
|
|
|
|
- openpyxl (Excel file reading)
|
|
|
|
|
- flask (Web GUI)
|
|
|
|
|
|
|
|
|
|
## Project Structure
|
|
|
|
|
|
|
|
|
|
- `main.py` - Main application entry point that launches the web GUI
|
|
|
|
|
- `data_comparator.py` - Core comparison logic for KST vs Coordi data analysis
|
|
|
|
|
- `web_gui.py` - Flask-based web GUI application
|
|
|
|
|
- `analyze_excel.py` - Basic Excel file structure analysis utility
|
|
|
|
|
- `data/` - Directory containing sample data files
|
|
|
|
|
- `sample-data.xlsx` - Sample Excel data file for comparison operations
|
|
|
|
|
- `templates/` - HTML templates for web GUI (auto-generated)
|
|
|
|
|
- `pyproject.toml` - Python project configuration and metadata
|
|
|
|
|
|
|
|
|
|
## Key Features
|
|
|
|
|
|
|
|
|
|
- **KST vs Coordi Comparison**: Compares data between KST columns (`Title KR`, `Epi.`) and Coordi columns (`KR title`, `Chap`)
|
|
|
|
|
- **Mismatch Categorization**: Identifies KST-only, Coordi-only, and duplicate items
|
|
|
|
|
- **Data Reconciliation**: Ensures matching counts after excluding mismatches
|
|
|
|
|
- **Web-based GUI**: Interactive interface with tabs for different data views
|
|
|
|
|
- **File Upload**: Upload Excel files directly through the web interface
|
|
|
|
|
- **Sheet Filtering**: Filter results by specific Excel sheets
|
|
|
|
|
- **Real-time Analysis**: Live comparison with detailed mismatch reasons
|
|
|
|
|
|
|
|
|
|
## Comparison Logic
|
|
|
|
|
|
|
|
|
|
The tool compares Excel data by:
|
2025-08-20 08:55:21 +00:00
|
|
|
1. **Sheet-specific analysis only** - No more "All Sheets" functionality, each sheet is analyzed independently
|
|
|
|
|
2. Finding columns by header names (not positions)
|
|
|
|
|
3. Extracting title+episode combinations from both datasets within the selected sheet
|
|
|
|
|
4. **Fixed duplicate detection** - Only items that appear multiple times within the same dataset are marked as duplicates
|
|
|
|
|
5. **Mixed duplicate priority** - Items that exist in both datasets but have duplicates on one side are prioritized over pure duplicates
|
|
|
|
|
6. Categorizing mismatches and calculating reconciliation
|
|
|
|
|
7. Displaying results with reasons for each discrepancy
|
|
|
|
|
|
|
|
|
|
### BA Confirmed Cases
|
|
|
|
|
- **US URGENT**: `금수의 영역 - Episode 17`, `신결 - Episode 23` (Coordi duplicates), `트윈 가이드 - Episode 31` (mixed duplicate)
|
|
|
|
|
- **TH URGENT**: `백라이트 - Episode 53-1x(휴재)` (KST duplicate, doesn't appear in Coordi)
|