data-comparison/CLAUDE.md
2025-08-21 11:23:33 +07:00

3.0 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Python data comparison tool built with Python 3.13+. The project is currently in early development with a minimal structure containing:

  • A basic Python application entry point (main.py)
  • Sample data in Excel format (data/sample-data.xlsx)
  • Standard Python packaging configuration (pyproject.toml)

Development Commands

Running the Application

uv run main.py

This launches a web-based GUI at http://localhost:8080

Running Analysis Only (Command Line)

uv run data_comparator.py

Project Setup

The project uses Python 3.13+ with uv for dependency management. Dependencies include:

  • pandas (Excel file processing)
  • openpyxl (Excel file reading)
  • flask (Web GUI)

Project Structure

  • main.py - Main application entry point that launches the web GUI
  • data_comparator.py - Core comparison logic for KST vs Coordi data analysis
  • web_gui.py - Flask-based web GUI application
  • analyze_excel.py - Basic Excel file structure analysis utility
  • data/ - Directory containing sample data files
    • sample-data.xlsx - Sample Excel data file for comparison operations
  • templates/ - HTML templates for web GUI (auto-generated)
  • pyproject.toml - Python project configuration and metadata

Key Features

  • KST vs Coordi Comparison: Compares data between KST columns (Title KR, Epi.) and Coordi columns (KR title, Chap)
  • Mismatch Categorization: Identifies KST-only, Coordi-only, and duplicate items
  • Data Reconciliation: Ensures matching counts after excluding mismatches
  • Web-based GUI: Interactive interface with tabs for different data views
  • File Upload: Upload Excel files directly through the web interface
  • Sheet Filtering: Filter results by specific Excel sheets
  • Real-time Analysis: Live comparison with detailed mismatch reasons

Comparison Logic

The tool compares Excel data by:

  1. Sheet-specific analysis only - No more "All Sheets" functionality, each sheet is analyzed independently
  2. Fixed column positions - KST data from columns I & J, Coordi data from columns C & D
  3. Extracting title+episode combinations from both datasets within the selected sheet
  4. Fixed duplicate detection - Only items that appear multiple times within the same dataset are marked as duplicates
  5. Mixed duplicate priority - Items that exist in both datasets but have duplicates on one side are prioritized over pure duplicates
  6. Categorizing mismatches and calculating reconciliation
  7. Displaying results with reasons for each discrepancy

Column Mapping

  • KST Data: Column I (title) and Column J (chapter/episode)
  • Coordi Data: Column C (title) and Column D (chapter/episode)

BA Confirmed Cases

  • US URGENT: 금수의 영역 - Episode 17, 신결 - Episode 23 (Coordi duplicates), 트윈 가이드 - Episode 31 (mixed duplicate)
  • TH URGENT: 백라이트 - Episode 53-1x(휴재) (KST duplicate, doesn't appear in Coordi)