3.6 KiB
3.6 KiB
Changes Summary - Data Comparison Logic Fix
Issues Fixed
1. Removed All-Sheet Functionality
- Problem: The tool was processing all sheets together, causing cross-sheet duplicate detection
- Solution: Completely removed all-sheet functionality, now only processes one sheet at a time
- Changes:
- Replaced
extract_kst_coordi_items()withextract_kst_coordi_items_for_sheet(sheet_name) - Updated all comparison methods to work sheet-specifically
- Replaced
2. Fixed Duplicate Detection Logic
- Problem: Items appearing once on each side were incorrectly marked as duplicates
- Solution: Fixed
_find_duplicates_in_list()to only return items that actually appear multiple times - Changes: Used
Counterto count occurrences and only return items with count > 1
3. Implemented Mixed Duplicate Priority
- Problem: Items showing as both pure duplicates and mixed duplicates
- Solution: Mixed duplicates (items in both datasets with duplicates on one side) now take priority
- Changes: Generate mixed duplicates first, then exclude those keys from pure duplicate lists
4. Sheet-Specific Analysis Only
- Problem: Cross-sheet contamination in duplicate detection
- Solution: All analysis now happens within a single sheet context
- Changes:
get_comparison_summary()now requires sheet filter and defaults to first sheet- Removed old filtering methods, replaced with sheet-specific extraction
BA Confirmed Cases - All Working ✅
US URGENT Sheet
- ✅
금수의 영역 - Episode 17→ Coordi duplicate - ✅
신결 - Episode 23→ Coordi duplicate - ✅
트윈 가이드 - Episode 31→ Mixed duplicate (exists in both, duplicates in Coordi) - ✅ No longer shows
트윈 가이드 - Episode 31as pure Coordi duplicate
TH URGENT Sheet
- ✅
백라이트 - Episode 53-1x(휴재)→ KST duplicate (doesn't appear in Coordi)
Code Changes Made
data_comparator.py
-
New Methods:
extract_kst_coordi_items_for_sheet(sheet_name)- Sheet-specific extractioncategorize_mismatches_for_sheet(sheet_data)- Sheet-specific categorizationgenerate_mismatch_details_for_sheet()- Sheet-specific mismatch details with priority logicgroup_by_title_for_sheet()- Sheet-specific grouping
-
Updated Methods:
_find_duplicates_in_list()- Fixed to only return actual duplicatesget_comparison_summary()- Now sheet-specific onlyprint_comparison_summary()- Added sheet name to output
-
Removed Methods:
extract_kst_coordi_items()- Replaced with sheet-specific versioncategorize_mismatches()- Replaced with sheet-specific versiongenerate_mismatch_details()- Replaced with sheet-specific versiongroup_by_title()- Replaced with sheet-specific versionfilter_by_sheet()- No longer neededfilter_grouped_data_by_sheet()- No longer neededcalculate_filtered_counts()- No longer needed
web_gui.py
- Updated matched items extraction to use new grouped data structure
- Removed dependency on old
categorize_mismatches()method
Test Files
test_ba_confirmed_cases.py- New test to verify BA confirmed expectationstest_sheet_filtering.py- Updated to work with new sheet-specific logic
Performance Improvements
- Faster analysis since no cross-sheet processing
- More accurate duplicate detection
- Cleaner separation of concerns between sheets
Verification
All tests pass:
- ✅ Sheet filtering works correctly
- ✅ Duplicate detection is accurate
- ✅ BA confirmed cases match expectations
- ✅ Web interface works properly
- ✅ Mixed duplicates take priority over pure duplicates