data-comparison/CLAUDE.md
2025-08-20 14:03:31 +07:00

2.2 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Python data comparison tool built with Python 3.13+. The project is currently in early development with a minimal structure containing:

  • A basic Python application entry point (main.py)
  • Sample data in Excel format (data/sample-data.xlsx)
  • Standard Python packaging configuration (pyproject.toml)

Development Commands

Running the Application

uv run main.py

This launches a web-based GUI at http://localhost:8080

Running Analysis Only (Command Line)

uv run data_comparator.py

Project Setup

The project uses Python 3.13+ with uv for dependency management. Dependencies include:

  • pandas (Excel file processing)
  • openpyxl (Excel file reading)
  • flask (Web GUI)

Project Structure

  • main.py - Main application entry point that launches the web GUI
  • data_comparator.py - Core comparison logic for KST vs Coordi data analysis
  • web_gui.py - Flask-based web GUI application
  • analyze_excel.py - Basic Excel file structure analysis utility
  • data/ - Directory containing sample data files
    • sample-data.xlsx - Sample Excel data file for comparison operations
  • templates/ - HTML templates for web GUI (auto-generated)
  • pyproject.toml - Python project configuration and metadata

Key Features

  • KST vs Coordi Comparison: Compares data between KST columns (Title KR, Epi.) and Coordi columns (KR title, Chap)
  • Mismatch Categorization: Identifies KST-only, Coordi-only, and duplicate items
  • Data Reconciliation: Ensures matching counts after excluding mismatches
  • Web-based GUI: Interactive interface with tabs for different data views
  • File Upload: Upload Excel files directly through the web interface
  • Sheet Filtering: Filter results by specific Excel sheets
  • Real-time Analysis: Live comparison with detailed mismatch reasons

Comparison Logic

The tool compares Excel data by:

  1. Finding columns by header names (not positions)
  2. Extracting title+episode combinations from both datasets
  3. Categorizing mismatches and calculating reconciliation
  4. Displaying results with reasons for each discrepancy