82 lines
3.6 KiB
Markdown
82 lines
3.6 KiB
Markdown
|
|
# Changes Summary - Data Comparison Logic Fix
|
||
|
|
|
||
|
|
## Issues Fixed
|
||
|
|
|
||
|
|
### 1. Removed All-Sheet Functionality
|
||
|
|
- **Problem**: The tool was processing all sheets together, causing cross-sheet duplicate detection
|
||
|
|
- **Solution**: Completely removed all-sheet functionality, now only processes one sheet at a time
|
||
|
|
- **Changes**:
|
||
|
|
- Replaced `extract_kst_coordi_items()` with `extract_kst_coordi_items_for_sheet(sheet_name)`
|
||
|
|
- Updated all comparison methods to work sheet-specifically
|
||
|
|
|
||
|
|
### 2. Fixed Duplicate Detection Logic
|
||
|
|
- **Problem**: Items appearing once on each side were incorrectly marked as duplicates
|
||
|
|
- **Solution**: Fixed `_find_duplicates_in_list()` to only return items that actually appear multiple times
|
||
|
|
- **Changes**: Used `Counter` to count occurrences and only return items with count > 1
|
||
|
|
|
||
|
|
### 3. Implemented Mixed Duplicate Priority
|
||
|
|
- **Problem**: Items showing as both pure duplicates and mixed duplicates
|
||
|
|
- **Solution**: Mixed duplicates (items in both datasets with duplicates on one side) now take priority
|
||
|
|
- **Changes**: Generate mixed duplicates first, then exclude those keys from pure duplicate lists
|
||
|
|
|
||
|
|
### 4. Sheet-Specific Analysis Only
|
||
|
|
- **Problem**: Cross-sheet contamination in duplicate detection
|
||
|
|
- **Solution**: All analysis now happens within a single sheet context
|
||
|
|
- **Changes**:
|
||
|
|
- `get_comparison_summary()` now requires sheet filter and defaults to first sheet
|
||
|
|
- Removed old filtering methods, replaced with sheet-specific extraction
|
||
|
|
|
||
|
|
## BA Confirmed Cases - All Working ✅
|
||
|
|
|
||
|
|
### US URGENT Sheet
|
||
|
|
- ✅ `금수의 영역 - Episode 17` → Coordi duplicate
|
||
|
|
- ✅ `신결 - Episode 23` → Coordi duplicate
|
||
|
|
- ✅ `트윈 가이드 - Episode 31` → Mixed duplicate (exists in both, duplicates in Coordi)
|
||
|
|
- ✅ No longer shows `트윈 가이드 - Episode 31` as pure Coordi duplicate
|
||
|
|
|
||
|
|
### TH URGENT Sheet
|
||
|
|
- ✅ `백라이트 - Episode 53-1x(휴재)` → KST duplicate (doesn't appear in Coordi)
|
||
|
|
|
||
|
|
## Code Changes Made
|
||
|
|
|
||
|
|
### data_comparator.py
|
||
|
|
1. **New Methods**:
|
||
|
|
- `extract_kst_coordi_items_for_sheet(sheet_name)` - Sheet-specific extraction
|
||
|
|
- `categorize_mismatches_for_sheet(sheet_data)` - Sheet-specific categorization
|
||
|
|
- `generate_mismatch_details_for_sheet()` - Sheet-specific mismatch details with priority logic
|
||
|
|
- `group_by_title_for_sheet()` - Sheet-specific grouping
|
||
|
|
|
||
|
|
2. **Updated Methods**:
|
||
|
|
- `_find_duplicates_in_list()` - Fixed to only return actual duplicates
|
||
|
|
- `get_comparison_summary()` - Now sheet-specific only
|
||
|
|
- `print_comparison_summary()` - Added sheet name to output
|
||
|
|
|
||
|
|
3. **Removed Methods**:
|
||
|
|
- `extract_kst_coordi_items()` - Replaced with sheet-specific version
|
||
|
|
- `categorize_mismatches()` - Replaced with sheet-specific version
|
||
|
|
- `generate_mismatch_details()` - Replaced with sheet-specific version
|
||
|
|
- `group_by_title()` - Replaced with sheet-specific version
|
||
|
|
- `filter_by_sheet()` - No longer needed
|
||
|
|
- `filter_grouped_data_by_sheet()` - No longer needed
|
||
|
|
- `calculate_filtered_counts()` - No longer needed
|
||
|
|
|
||
|
|
### web_gui.py
|
||
|
|
- Updated matched items extraction to use new grouped data structure
|
||
|
|
- Removed dependency on old `categorize_mismatches()` method
|
||
|
|
|
||
|
|
### Test Files
|
||
|
|
- `test_ba_confirmed_cases.py` - New test to verify BA confirmed expectations
|
||
|
|
- `test_sheet_filtering.py` - Updated to work with new sheet-specific logic
|
||
|
|
|
||
|
|
## Performance Improvements
|
||
|
|
- Faster analysis since no cross-sheet processing
|
||
|
|
- More accurate duplicate detection
|
||
|
|
- Cleaner separation of concerns between sheets
|
||
|
|
|
||
|
|
## Verification
|
||
|
|
All tests pass:
|
||
|
|
- ✅ Sheet filtering works correctly
|
||
|
|
- ✅ Duplicate detection is accurate
|
||
|
|
- ✅ BA confirmed cases match expectations
|
||
|
|
- ✅ Web interface works properly
|
||
|
|
- ✅ Mixed duplicates take priority over pure duplicates
|