3.0 KiB
3.0 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is a Python data comparison tool built with Python 3.13+. The project is currently in early development with a minimal structure containing:
- A basic Python application entry point (
main.py) - Sample data in Excel format (
data/sample-data.xlsx) - Standard Python packaging configuration (
pyproject.toml)
Development Commands
Running the Application
uv run main.py
This launches a web-based GUI at http://localhost:8080
Running Analysis Only (Command Line)
uv run data_comparator.py
Project Setup
The project uses Python 3.13+ with uv for dependency management. Dependencies include:
- pandas (Excel file processing)
- openpyxl (Excel file reading)
- flask (Web GUI)
Project Structure
main.py- Main application entry point that launches the web GUIdata_comparator.py- Core comparison logic for KST vs Coordi data analysisweb_gui.py- Flask-based web GUI applicationanalyze_excel.py- Basic Excel file structure analysis utilitydata/- Directory containing sample data filessample-data.xlsx- Sample Excel data file for comparison operations
templates/- HTML templates for web GUI (auto-generated)pyproject.toml- Python project configuration and metadata
Key Features
- KST vs Coordi Comparison: Compares data between KST columns (
Title KR,Epi.) and Coordi columns (KR title,Chap) - Mismatch Categorization: Identifies KST-only, Coordi-only, and duplicate items
- Data Reconciliation: Ensures matching counts after excluding mismatches
- Web-based GUI: Interactive interface with tabs for different data views
- File Upload: Upload Excel files directly through the web interface
- Sheet Filtering: Filter results by specific Excel sheets
- Real-time Analysis: Live comparison with detailed mismatch reasons
Comparison Logic
The tool compares Excel data by:
- Sheet-specific analysis only - No more "All Sheets" functionality, each sheet is analyzed independently
- Fixed column positions - KST data from columns I & J, Coordi data from columns C & D
- Extracting title+episode combinations from both datasets within the selected sheet
- Fixed duplicate detection - Only items that appear multiple times within the same dataset are marked as duplicates
- Mixed duplicate priority - Items that exist in both datasets but have duplicates on one side are prioritized over pure duplicates
- Categorizing mismatches and calculating reconciliation
- Displaying results with reasons for each discrepancy
Column Mapping
- KST Data: Column I (title) and Column J (chapter/episode)
- Coordi Data: Column C (title) and Column D (chapter/episode)
BA Confirmed Cases
- US URGENT:
금수의 영역 - Episode 17,신결 - Episode 23(Coordi duplicates),트윈 가이드 - Episode 31(mixed duplicate) - TH URGENT:
백라이트 - Episode 53-1x(휴재)(KST duplicate, doesn't appear in Coordi)