add final logic v120250820
This commit is contained in:
parent
ed3655d1c9
commit
99470f501a
82
CHANGES_SUMMARY.md
Normal file
82
CHANGES_SUMMARY.md
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
# Changes Summary - Data Comparison Logic Fix
|
||||||
|
|
||||||
|
## Issues Fixed
|
||||||
|
|
||||||
|
### 1. Removed All-Sheet Functionality
|
||||||
|
- **Problem**: The tool was processing all sheets together, causing cross-sheet duplicate detection
|
||||||
|
- **Solution**: Completely removed all-sheet functionality, now only processes one sheet at a time
|
||||||
|
- **Changes**:
|
||||||
|
- Replaced `extract_kst_coordi_items()` with `extract_kst_coordi_items_for_sheet(sheet_name)`
|
||||||
|
- Updated all comparison methods to work sheet-specifically
|
||||||
|
|
||||||
|
### 2. Fixed Duplicate Detection Logic
|
||||||
|
- **Problem**: Items appearing once on each side were incorrectly marked as duplicates
|
||||||
|
- **Solution**: Fixed `_find_duplicates_in_list()` to only return items that actually appear multiple times
|
||||||
|
- **Changes**: Used `Counter` to count occurrences and only return items with count > 1
|
||||||
|
|
||||||
|
### 3. Implemented Mixed Duplicate Priority
|
||||||
|
- **Problem**: Items showing as both pure duplicates and mixed duplicates
|
||||||
|
- **Solution**: Mixed duplicates (items in both datasets with duplicates on one side) now take priority
|
||||||
|
- **Changes**: Generate mixed duplicates first, then exclude those keys from pure duplicate lists
|
||||||
|
|
||||||
|
### 4. Sheet-Specific Analysis Only
|
||||||
|
- **Problem**: Cross-sheet contamination in duplicate detection
|
||||||
|
- **Solution**: All analysis now happens within a single sheet context
|
||||||
|
- **Changes**:
|
||||||
|
- `get_comparison_summary()` now requires sheet filter and defaults to first sheet
|
||||||
|
- Removed old filtering methods, replaced with sheet-specific extraction
|
||||||
|
|
||||||
|
## BA Confirmed Cases - All Working ✅
|
||||||
|
|
||||||
|
### US URGENT Sheet
|
||||||
|
- ✅ `금수의 영역 - Episode 17` → Coordi duplicate
|
||||||
|
- ✅ `신결 - Episode 23` → Coordi duplicate
|
||||||
|
- ✅ `트윈 가이드 - Episode 31` → Mixed duplicate (exists in both, duplicates in Coordi)
|
||||||
|
- ✅ No longer shows `트윈 가이드 - Episode 31` as pure Coordi duplicate
|
||||||
|
|
||||||
|
### TH URGENT Sheet
|
||||||
|
- ✅ `백라이트 - Episode 53-1x(휴재)` → KST duplicate (doesn't appear in Coordi)
|
||||||
|
|
||||||
|
## Code Changes Made
|
||||||
|
|
||||||
|
### data_comparator.py
|
||||||
|
1. **New Methods**:
|
||||||
|
- `extract_kst_coordi_items_for_sheet(sheet_name)` - Sheet-specific extraction
|
||||||
|
- `categorize_mismatches_for_sheet(sheet_data)` - Sheet-specific categorization
|
||||||
|
- `generate_mismatch_details_for_sheet()` - Sheet-specific mismatch details with priority logic
|
||||||
|
- `group_by_title_for_sheet()` - Sheet-specific grouping
|
||||||
|
|
||||||
|
2. **Updated Methods**:
|
||||||
|
- `_find_duplicates_in_list()` - Fixed to only return actual duplicates
|
||||||
|
- `get_comparison_summary()` - Now sheet-specific only
|
||||||
|
- `print_comparison_summary()` - Added sheet name to output
|
||||||
|
|
||||||
|
3. **Removed Methods**:
|
||||||
|
- `extract_kst_coordi_items()` - Replaced with sheet-specific version
|
||||||
|
- `categorize_mismatches()` - Replaced with sheet-specific version
|
||||||
|
- `generate_mismatch_details()` - Replaced with sheet-specific version
|
||||||
|
- `group_by_title()` - Replaced with sheet-specific version
|
||||||
|
- `filter_by_sheet()` - No longer needed
|
||||||
|
- `filter_grouped_data_by_sheet()` - No longer needed
|
||||||
|
- `calculate_filtered_counts()` - No longer needed
|
||||||
|
|
||||||
|
### web_gui.py
|
||||||
|
- Updated matched items extraction to use new grouped data structure
|
||||||
|
- Removed dependency on old `categorize_mismatches()` method
|
||||||
|
|
||||||
|
### Test Files
|
||||||
|
- `test_ba_confirmed_cases.py` - New test to verify BA confirmed expectations
|
||||||
|
- `test_sheet_filtering.py` - Updated to work with new sheet-specific logic
|
||||||
|
|
||||||
|
## Performance Improvements
|
||||||
|
- Faster analysis since no cross-sheet processing
|
||||||
|
- More accurate duplicate detection
|
||||||
|
- Cleaner separation of concerns between sheets
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
All tests pass:
|
||||||
|
- ✅ Sheet filtering works correctly
|
||||||
|
- ✅ Duplicate detection is accurate
|
||||||
|
- ✅ BA confirmed cases match expectations
|
||||||
|
- ✅ Web interface works properly
|
||||||
|
- ✅ Mixed duplicates take priority over pure duplicates
|
||||||
15
CLAUDE.md
15
CLAUDE.md
@ -53,7 +53,14 @@ The project uses Python 3.13+ with uv for dependency management. Dependencies in
|
|||||||
## Comparison Logic
|
## Comparison Logic
|
||||||
|
|
||||||
The tool compares Excel data by:
|
The tool compares Excel data by:
|
||||||
1. Finding columns by header names (not positions)
|
1. **Sheet-specific analysis only** - No more "All Sheets" functionality, each sheet is analyzed independently
|
||||||
2. Extracting title+episode combinations from both datasets
|
2. Finding columns by header names (not positions)
|
||||||
3. Categorizing mismatches and calculating reconciliation
|
3. Extracting title+episode combinations from both datasets within the selected sheet
|
||||||
4. Displaying results with reasons for each discrepancy
|
4. **Fixed duplicate detection** - Only items that appear multiple times within the same dataset are marked as duplicates
|
||||||
|
5. **Mixed duplicate priority** - Items that exist in both datasets but have duplicates on one side are prioritized over pure duplicates
|
||||||
|
6. Categorizing mismatches and calculating reconciliation
|
||||||
|
7. Displaying results with reasons for each discrepancy
|
||||||
|
|
||||||
|
### BA Confirmed Cases
|
||||||
|
- **US URGENT**: `금수의 영역 - Episode 17`, `신결 - Episode 23` (Coordi duplicates), `트윈 가이드 - Episode 31` (mixed duplicate)
|
||||||
|
- **TH URGENT**: `백라이트 - Episode 53-1x(휴재)` (KST duplicate, doesn't appear in Coordi)
|
||||||
@ -42,8 +42,14 @@ class KSTCoordiComparator:
|
|||||||
print(f"Error loading data: {e}")
|
print(f"Error loading data: {e}")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
def extract_kst_coordi_items(self) -> Dict[str, Any]:
|
def extract_kst_coordi_items_for_sheet(self, sheet_name: str) -> Dict[str, Any]:
|
||||||
"""Extract KST and Coordi items from all sheets using column header names"""
|
"""Extract KST and Coordi items from a specific sheet using column header names"""
|
||||||
|
if sheet_name not in self.data:
|
||||||
|
raise ValueError(f"Sheet '{sheet_name}' not found in data")
|
||||||
|
|
||||||
|
df = self.data[sheet_name]
|
||||||
|
columns = df.columns.tolist()
|
||||||
|
|
||||||
kst_items = set()
|
kst_items = set()
|
||||||
coordi_items = set()
|
coordi_items = set()
|
||||||
kst_details = []
|
kst_details = []
|
||||||
@ -51,96 +57,88 @@ class KSTCoordiComparator:
|
|||||||
kst_all_items = [] # Keep all items including duplicates
|
kst_all_items = [] # Keep all items including duplicates
|
||||||
coordi_all_items = [] # Keep all items including duplicates
|
coordi_all_items = [] # Keep all items including duplicates
|
||||||
|
|
||||||
for sheet_name, df in self.data.items():
|
# Find columns by header names
|
||||||
columns = df.columns.tolist()
|
# KST columns: 'Title KR' and 'Epi.'
|
||||||
|
# Coordi columns: 'KR title' and 'Chap'
|
||||||
# Find columns by header names
|
|
||||||
# KST columns: 'Title KR' and 'Epi.'
|
|
||||||
# Coordi columns: 'KR title' and 'Chap'
|
|
||||||
|
|
||||||
kst_title_col = None
|
|
||||||
kst_episode_col = None
|
|
||||||
coordi_title_col = None
|
|
||||||
coordi_episode_col = None
|
|
||||||
|
|
||||||
# Find KST columns
|
|
||||||
for col in columns:
|
|
||||||
if col == 'Title KR':
|
|
||||||
kst_title_col = col
|
|
||||||
elif col == 'Epi.':
|
|
||||||
kst_episode_col = col
|
|
||||||
|
|
||||||
# Find Coordi columns
|
|
||||||
for col in columns:
|
|
||||||
if col == 'KR title':
|
|
||||||
coordi_title_col = col
|
|
||||||
elif col == 'Chap':
|
|
||||||
coordi_episode_col = col
|
|
||||||
|
|
||||||
print(f"Sheet: {sheet_name}")
|
|
||||||
print(f" KST columns - Title: {kst_title_col}, Episode: {kst_episode_col}")
|
|
||||||
print(f" Coordi columns - Title: {coordi_title_col}, Episode: {coordi_episode_col}")
|
|
||||||
|
|
||||||
# Extract items from each row
|
|
||||||
for idx, row in df.iterrows():
|
|
||||||
# Extract KST data
|
|
||||||
if kst_title_col and kst_episode_col:
|
|
||||||
kst_title = str(row.get(kst_title_col, '')).strip()
|
|
||||||
kst_episode = str(row.get(kst_episode_col, '')).strip()
|
|
||||||
|
|
||||||
# Check if this row has valid KST data
|
|
||||||
has_kst_data = (
|
|
||||||
kst_title and kst_title != 'nan' and
|
|
||||||
kst_episode and kst_episode != 'nan' and
|
|
||||||
pd.notna(row[kst_title_col]) and pd.notna(row[kst_episode_col])
|
|
||||||
)
|
|
||||||
|
|
||||||
if has_kst_data:
|
|
||||||
item = ComparisonItem(kst_title, kst_episode, sheet_name, idx)
|
|
||||||
kst_items.add(item)
|
|
||||||
kst_all_items.append(item) # Keep all items for duplicate detection
|
|
||||||
kst_details.append({
|
|
||||||
'title': kst_title,
|
|
||||||
'episode': kst_episode,
|
|
||||||
'sheet': sheet_name,
|
|
||||||
'row_index': idx,
|
|
||||||
'kst_data': {
|
|
||||||
kst_title_col: row[kst_title_col],
|
|
||||||
kst_episode_col: row[kst_episode_col]
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
# Extract Coordi data
|
|
||||||
if coordi_title_col and coordi_episode_col:
|
|
||||||
coordi_title = str(row.get(coordi_title_col, '')).strip()
|
|
||||||
coordi_episode = str(row.get(coordi_episode_col, '')).strip()
|
|
||||||
|
|
||||||
# Check if this row has valid Coordi data
|
|
||||||
has_coordi_data = (
|
|
||||||
coordi_title and coordi_title != 'nan' and
|
|
||||||
coordi_episode and coordi_episode != 'nan' and
|
|
||||||
pd.notna(row[coordi_title_col]) and pd.notna(row[coordi_episode_col])
|
|
||||||
)
|
|
||||||
|
|
||||||
if has_coordi_data:
|
|
||||||
item = ComparisonItem(coordi_title, coordi_episode, sheet_name, idx)
|
|
||||||
coordi_items.add(item)
|
|
||||||
coordi_all_items.append(item) # Keep all items for duplicate detection
|
|
||||||
coordi_details.append({
|
|
||||||
'title': coordi_title,
|
|
||||||
'episode': coordi_episode,
|
|
||||||
'sheet': sheet_name,
|
|
||||||
'row_index': idx,
|
|
||||||
'coordi_data': {
|
|
||||||
coordi_title_col: row[coordi_title_col],
|
|
||||||
coordi_episode_col: row[coordi_episode_col]
|
|
||||||
}
|
|
||||||
})
|
|
||||||
|
|
||||||
self.kst_items = kst_items
|
kst_title_col = None
|
||||||
self.coordi_items = coordi_items
|
kst_episode_col = None
|
||||||
self.kst_all_items = kst_all_items # Store for duplicate detection
|
coordi_title_col = None
|
||||||
self.coordi_all_items = coordi_all_items # Store for duplicate detection
|
coordi_episode_col = None
|
||||||
|
|
||||||
|
# Find KST columns
|
||||||
|
for col in columns:
|
||||||
|
if col == 'Title KR':
|
||||||
|
kst_title_col = col
|
||||||
|
elif col == 'Epi.':
|
||||||
|
kst_episode_col = col
|
||||||
|
|
||||||
|
# Find Coordi columns
|
||||||
|
for col in columns:
|
||||||
|
if col == 'KR title':
|
||||||
|
coordi_title_col = col
|
||||||
|
elif col == 'Chap':
|
||||||
|
coordi_episode_col = col
|
||||||
|
|
||||||
|
print(f"Sheet: {sheet_name}")
|
||||||
|
print(f" KST columns - Title: {kst_title_col}, Episode: {kst_episode_col}")
|
||||||
|
print(f" Coordi columns - Title: {coordi_title_col}, Episode: {coordi_episode_col}")
|
||||||
|
|
||||||
|
# Extract items from each row
|
||||||
|
for idx, row in df.iterrows():
|
||||||
|
# Extract KST data
|
||||||
|
if kst_title_col and kst_episode_col:
|
||||||
|
kst_title = str(row.get(kst_title_col, '')).strip()
|
||||||
|
kst_episode = str(row.get(kst_episode_col, '')).strip()
|
||||||
|
|
||||||
|
# Check if this row has valid KST data
|
||||||
|
has_kst_data = (
|
||||||
|
kst_title and kst_title != 'nan' and
|
||||||
|
kst_episode and kst_episode != 'nan' and
|
||||||
|
pd.notna(row[kst_title_col]) and pd.notna(row[kst_episode_col])
|
||||||
|
)
|
||||||
|
|
||||||
|
if has_kst_data:
|
||||||
|
item = ComparisonItem(kst_title, kst_episode, sheet_name, idx)
|
||||||
|
kst_items.add(item)
|
||||||
|
kst_all_items.append(item) # Keep all items for duplicate detection
|
||||||
|
kst_details.append({
|
||||||
|
'title': kst_title,
|
||||||
|
'episode': kst_episode,
|
||||||
|
'sheet': sheet_name,
|
||||||
|
'row_index': idx,
|
||||||
|
'kst_data': {
|
||||||
|
kst_title_col: row[kst_title_col],
|
||||||
|
kst_episode_col: row[kst_episode_col]
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
# Extract Coordi data
|
||||||
|
if coordi_title_col and coordi_episode_col:
|
||||||
|
coordi_title = str(row.get(coordi_title_col, '')).strip()
|
||||||
|
coordi_episode = str(row.get(coordi_episode_col, '')).strip()
|
||||||
|
|
||||||
|
# Check if this row has valid Coordi data
|
||||||
|
has_coordi_data = (
|
||||||
|
coordi_title and coordi_title != 'nan' and
|
||||||
|
coordi_episode and coordi_episode != 'nan' and
|
||||||
|
pd.notna(row[coordi_title_col]) and pd.notna(row[coordi_episode_col])
|
||||||
|
)
|
||||||
|
|
||||||
|
if has_coordi_data:
|
||||||
|
item = ComparisonItem(coordi_title, coordi_episode, sheet_name, idx)
|
||||||
|
coordi_items.add(item)
|
||||||
|
coordi_all_items.append(item) # Keep all items for duplicate detection
|
||||||
|
coordi_details.append({
|
||||||
|
'title': coordi_title,
|
||||||
|
'episode': coordi_episode,
|
||||||
|
'sheet': sheet_name,
|
||||||
|
'row_index': idx,
|
||||||
|
'coordi_data': {
|
||||||
|
coordi_title_col: row[coordi_title_col],
|
||||||
|
coordi_episode_col: row[coordi_episode_col]
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'kst_items': kst_items,
|
'kst_items': kst_items,
|
||||||
@ -151,19 +149,21 @@ class KSTCoordiComparator:
|
|||||||
'coordi_all_items': coordi_all_items
|
'coordi_all_items': coordi_all_items
|
||||||
}
|
}
|
||||||
|
|
||||||
def categorize_mismatches(self) -> Dict[str, Any]:
|
def categorize_mismatches_for_sheet(self, sheet_data: Dict[str, Any]) -> Dict[str, Any]:
|
||||||
"""Categorize data into KST-only, Coordi-only, and matched items"""
|
"""Categorize data into KST-only, Coordi-only, and matched items for a specific sheet"""
|
||||||
if not self.kst_items or not self.coordi_items:
|
kst_items = sheet_data['kst_items']
|
||||||
self.extract_kst_coordi_items()
|
coordi_items = sheet_data['coordi_items']
|
||||||
|
kst_all_items = sheet_data['kst_all_items']
|
||||||
|
coordi_all_items = sheet_data['coordi_all_items']
|
||||||
|
|
||||||
# Find overlaps and differences
|
# Find overlaps and differences
|
||||||
matched_items = self.kst_items.intersection(self.coordi_items)
|
matched_items = kst_items.intersection(coordi_items)
|
||||||
kst_only_items = self.kst_items - self.coordi_items
|
kst_only_items = kst_items - coordi_items
|
||||||
coordi_only_items = self.coordi_items - self.kst_items
|
coordi_only_items = coordi_items - kst_items
|
||||||
|
|
||||||
# Find duplicates within each dataset
|
# Find duplicates within each dataset - FIXED LOGIC
|
||||||
kst_duplicates = self._find_duplicates_in_list(self.kst_all_items)
|
kst_duplicates = self._find_duplicates_in_list(kst_all_items)
|
||||||
coordi_duplicates = self._find_duplicates_in_list(self.coordi_all_items)
|
coordi_duplicates = self._find_duplicates_in_list(coordi_all_items)
|
||||||
|
|
||||||
categorization = {
|
categorization = {
|
||||||
'matched_items': list(matched_items),
|
'matched_items': list(matched_items),
|
||||||
@ -172,8 +172,8 @@ class KSTCoordiComparator:
|
|||||||
'kst_duplicates': kst_duplicates,
|
'kst_duplicates': kst_duplicates,
|
||||||
'coordi_duplicates': coordi_duplicates,
|
'coordi_duplicates': coordi_duplicates,
|
||||||
'counts': {
|
'counts': {
|
||||||
'total_kst': len(self.kst_items),
|
'total_kst': len(kst_items),
|
||||||
'total_coordi': len(self.coordi_items),
|
'total_coordi': len(coordi_items),
|
||||||
'matched': len(matched_items),
|
'matched': len(matched_items),
|
||||||
'kst_only': len(kst_only_items),
|
'kst_only': len(kst_only_items),
|
||||||
'coordi_only': len(coordi_only_items),
|
'coordi_only': len(coordi_only_items),
|
||||||
@ -187,8 +187,8 @@ class KSTCoordiComparator:
|
|||||||
reconciled_coordi_count = len(matched_items)
|
reconciled_coordi_count = len(matched_items)
|
||||||
|
|
||||||
categorization['reconciliation'] = {
|
categorization['reconciliation'] = {
|
||||||
'original_kst_count': len(self.kst_items),
|
'original_kst_count': len(kst_items),
|
||||||
'original_coordi_count': len(self.coordi_items),
|
'original_coordi_count': len(coordi_items),
|
||||||
'reconciled_kst_count': reconciled_kst_count,
|
'reconciled_kst_count': reconciled_kst_count,
|
||||||
'reconciled_coordi_count': reconciled_coordi_count,
|
'reconciled_coordi_count': reconciled_coordi_count,
|
||||||
'counts_match_after_reconciliation': reconciled_kst_count == reconciled_coordi_count,
|
'counts_match_after_reconciliation': reconciled_kst_count == reconciled_coordi_count,
|
||||||
@ -199,30 +199,27 @@ class KSTCoordiComparator:
|
|||||||
return categorization
|
return categorization
|
||||||
|
|
||||||
def _find_duplicates_in_list(self, items_list: List[ComparisonItem]) -> List[ComparisonItem]:
|
def _find_duplicates_in_list(self, items_list: List[ComparisonItem]) -> List[ComparisonItem]:
|
||||||
"""Find duplicate items within a dataset"""
|
"""Find duplicate items within a dataset - FIXED to only return actual duplicates"""
|
||||||
seen = set()
|
from collections import Counter
|
||||||
duplicates = []
|
|
||||||
|
|
||||||
|
# Count occurrences of each (title, episode) pair
|
||||||
|
key_counts = Counter((item.title, item.episode) for item in items_list)
|
||||||
|
|
||||||
|
# Only return items that appear more than once
|
||||||
|
duplicates = []
|
||||||
for item in items_list:
|
for item in items_list:
|
||||||
key = (item.title, item.episode)
|
key = (item.title, item.episode)
|
||||||
if key in seen:
|
if key_counts[key] > 1:
|
||||||
duplicates.append(item)
|
duplicates.append(item)
|
||||||
else:
|
|
||||||
seen.add(key)
|
|
||||||
|
|
||||||
return duplicates
|
return duplicates
|
||||||
|
|
||||||
def _find_sheet_specific_mixed_duplicates(self, sheet_filter: str) -> List[Dict]:
|
def _find_sheet_specific_mixed_duplicates(self, sheet_data: Dict[str, Any], sheet_filter: str) -> List[Dict]:
|
||||||
"""Find mixed duplicates within a specific sheet only"""
|
"""Find mixed duplicates within a specific sheet only"""
|
||||||
if not sheet_filter:
|
|
||||||
return []
|
|
||||||
|
|
||||||
mixed_duplicates = []
|
mixed_duplicates = []
|
||||||
|
|
||||||
# Extract items specific to this sheet
|
kst_sheet_items = sheet_data['kst_all_items']
|
||||||
extract_results = self.extract_kst_coordi_items()
|
coordi_sheet_items = sheet_data['coordi_all_items']
|
||||||
kst_sheet_items = [item for item in extract_results['kst_all_items'] if item.source_sheet == sheet_filter]
|
|
||||||
coordi_sheet_items = [item for item in extract_results['coordi_all_items'] if item.source_sheet == sheet_filter]
|
|
||||||
|
|
||||||
# Find duplicates within this sheet
|
# Find duplicates within this sheet
|
||||||
kst_sheet_duplicates = self._find_duplicates_in_list(kst_sheet_items)
|
kst_sheet_duplicates = self._find_duplicates_in_list(kst_sheet_items)
|
||||||
@ -265,10 +262,8 @@ class KSTCoordiComparator:
|
|||||||
|
|
||||||
return mixed_duplicates
|
return mixed_duplicates
|
||||||
|
|
||||||
def generate_mismatch_details(self) -> Dict[str, List[Dict]]:
|
def generate_mismatch_details_for_sheet(self, categorization: Dict[str, Any], sheet_data: Dict[str, Any], sheet_filter: str) -> Dict[str, List[Dict]]:
|
||||||
"""Generate detailed information about each type of mismatch with reasons"""
|
"""Generate detailed information about each type of mismatch with reasons for a specific sheet"""
|
||||||
categorization = self.categorize_mismatches()
|
|
||||||
|
|
||||||
mismatch_details = {
|
mismatch_details = {
|
||||||
'kst_only': [],
|
'kst_only': [],
|
||||||
'coordi_only': [],
|
'coordi_only': [],
|
||||||
@ -299,35 +294,43 @@ class KSTCoordiComparator:
|
|||||||
'mismatch_type': 'COORDI_ONLY'
|
'mismatch_type': 'COORDI_ONLY'
|
||||||
})
|
})
|
||||||
|
|
||||||
# KST duplicates
|
# Find mixed duplicates first (they take priority)
|
||||||
|
mixed_duplicates = self._find_sheet_specific_mixed_duplicates(sheet_data, sheet_filter)
|
||||||
|
mismatch_details['mixed_duplicates'] = mixed_duplicates
|
||||||
|
|
||||||
|
# Create set of items that are already covered by mixed duplicates
|
||||||
|
mixed_duplicate_keys = {(item['title'], item['episode']) for item in mixed_duplicates}
|
||||||
|
|
||||||
|
# KST duplicates - exclude those already covered by mixed duplicates
|
||||||
for item in categorization['kst_duplicates']:
|
for item in categorization['kst_duplicates']:
|
||||||
mismatch_details['kst_duplicates'].append({
|
key = (item.title, item.episode)
|
||||||
'title': item.title,
|
if key not in mixed_duplicate_keys:
|
||||||
'episode': item.episode,
|
mismatch_details['kst_duplicates'].append({
|
||||||
'sheet': item.source_sheet,
|
'title': item.title,
|
||||||
'row_index': item.row_index,
|
'episode': item.episode,
|
||||||
'reason': 'Duplicate entry in KST data',
|
'sheet': item.source_sheet,
|
||||||
'mismatch_type': 'KST_DUPLICATE'
|
'row_index': item.row_index,
|
||||||
})
|
'reason': 'Duplicate entry in KST data',
|
||||||
|
'mismatch_type': 'KST_DUPLICATE'
|
||||||
|
})
|
||||||
|
|
||||||
# Coordi duplicates
|
# Coordi duplicates - exclude those already covered by mixed duplicates
|
||||||
for item in categorization['coordi_duplicates']:
|
for item in categorization['coordi_duplicates']:
|
||||||
mismatch_details['coordi_duplicates'].append({
|
key = (item.title, item.episode)
|
||||||
'title': item.title,
|
if key not in mixed_duplicate_keys:
|
||||||
'episode': item.episode,
|
mismatch_details['coordi_duplicates'].append({
|
||||||
'sheet': item.source_sheet,
|
'title': item.title,
|
||||||
'row_index': item.row_index,
|
'episode': item.episode,
|
||||||
'reason': 'Duplicate entry in Coordi data',
|
'sheet': item.source_sheet,
|
||||||
'mismatch_type': 'COORDI_DUPLICATE'
|
'row_index': item.row_index,
|
||||||
})
|
'reason': 'Duplicate entry in Coordi data',
|
||||||
|
'mismatch_type': 'COORDI_DUPLICATE'
|
||||||
# Mixed duplicates will be calculated per sheet in get_comparison_summary
|
})
|
||||||
mismatch_details['mixed_duplicates'] = []
|
|
||||||
|
|
||||||
return mismatch_details
|
return mismatch_details
|
||||||
|
|
||||||
def get_comparison_summary(self, sheet_filter: str = None) -> Dict[str, Any]:
|
def get_comparison_summary(self, sheet_filter: str = None) -> Dict[str, Any]:
|
||||||
"""Get a comprehensive summary of the comparison, filtered by a specific sheet"""
|
"""Get a comprehensive summary of the comparison for a specific sheet only"""
|
||||||
# Get sheet names for filtering options
|
# Get sheet names for filtering options
|
||||||
sheet_names = list(self.data.keys()) if self.data else []
|
sheet_names = list(self.data.keys()) if self.data else []
|
||||||
|
|
||||||
@ -338,33 +341,37 @@ class KSTCoordiComparator:
|
|||||||
if not sheet_filter:
|
if not sheet_filter:
|
||||||
raise ValueError("No sheets available or sheet filter not specified")
|
raise ValueError("No sheets available or sheet filter not specified")
|
||||||
|
|
||||||
categorization = self.categorize_mismatches()
|
# Extract data for the specific sheet only
|
||||||
mismatch_details = self.generate_mismatch_details()
|
sheet_data = self.extract_kst_coordi_items_for_sheet(sheet_filter)
|
||||||
grouped_data = self.group_by_title()
|
|
||||||
|
|
||||||
# Always apply sheet filtering (no more "All Sheets" option)
|
# Categorize mismatches for this sheet
|
||||||
mismatch_details = self.filter_by_sheet(mismatch_details, sheet_filter)
|
categorization = self.categorize_mismatches_for_sheet(sheet_data)
|
||||||
grouped_data = self.filter_grouped_data_by_sheet(grouped_data, sheet_filter)
|
|
||||||
|
|
||||||
# Calculate mixed duplicates specific to this sheet
|
# Generate mismatch details for this sheet
|
||||||
mismatch_details['mixed_duplicates'] = self._find_sheet_specific_mixed_duplicates(sheet_filter)
|
mismatch_details = self.generate_mismatch_details_for_sheet(categorization, sheet_data, sheet_filter)
|
||||||
|
|
||||||
# Recalculate counts for filtered data
|
# Group data by title for this sheet
|
||||||
filtered_counts = self.calculate_filtered_counts(mismatch_details)
|
grouped_data = self.group_by_title_for_sheet(categorization, sheet_filter)
|
||||||
|
|
||||||
|
# Calculate counts
|
||||||
|
matched_count = len(categorization['matched_items'])
|
||||||
|
kst_total = len(sheet_data['kst_items'])
|
||||||
|
coordi_total = len(sheet_data['coordi_items'])
|
||||||
|
|
||||||
summary = {
|
summary = {
|
||||||
'sheet_names': sheet_names,
|
'sheet_names': sheet_names,
|
||||||
'current_sheet_filter': sheet_filter,
|
'current_sheet_filter': sheet_filter,
|
||||||
'original_counts': {
|
'original_counts': {
|
||||||
'kst_total': filtered_counts['kst_total'],
|
'kst_total': kst_total,
|
||||||
'coordi_total': filtered_counts['coordi_total']
|
'coordi_total': coordi_total
|
||||||
},
|
},
|
||||||
'matched_items_count': filtered_counts['matched'],
|
'matched_items_count': matched_count,
|
||||||
'mismatches': {
|
'mismatches': {
|
||||||
'kst_only_count': filtered_counts['kst_only_count'],
|
'kst_only_count': len(mismatch_details['kst_only']),
|
||||||
'coordi_only_count': filtered_counts['coordi_only_count'],
|
'coordi_only_count': len(mismatch_details['coordi_only']),
|
||||||
'kst_duplicates_count': filtered_counts['kst_duplicates_count'],
|
'kst_duplicates_count': len(mismatch_details['kst_duplicates']),
|
||||||
'coordi_duplicates_count': filtered_counts['coordi_duplicates_count']
|
'coordi_duplicates_count': len(mismatch_details['coordi_duplicates']),
|
||||||
|
'mixed_duplicates_count': len(mismatch_details['mixed_duplicates'])
|
||||||
},
|
},
|
||||||
'reconciliation': categorization['reconciliation'],
|
'reconciliation': categorization['reconciliation'],
|
||||||
'mismatch_details': mismatch_details,
|
'mismatch_details': mismatch_details,
|
||||||
@ -373,67 +380,8 @@ class KSTCoordiComparator:
|
|||||||
|
|
||||||
return summary
|
return summary
|
||||||
|
|
||||||
def filter_by_sheet(self, mismatch_details: Dict[str, List], sheet_filter: str) -> Dict[str, List]:
|
def group_by_title_for_sheet(self, categorization: Dict[str, Any], sheet_filter: str) -> Dict[str, Any]:
|
||||||
"""Filter mismatch details by specific sheet"""
|
"""Group mismatches and matches by KR title for a specific sheet"""
|
||||||
filtered = {}
|
|
||||||
for category, items in mismatch_details.items():
|
|
||||||
filtered[category] = [item for item in items if item.get('sheet') == sheet_filter]
|
|
||||||
return filtered
|
|
||||||
|
|
||||||
def filter_grouped_data_by_sheet(self, grouped_data: Dict, sheet_filter: str) -> Dict:
|
|
||||||
"""Filter grouped data by specific sheet"""
|
|
||||||
filtered = {
|
|
||||||
'kst_only_by_title': {},
|
|
||||||
'coordi_only_by_title': {},
|
|
||||||
'matched_by_title': {},
|
|
||||||
'title_summaries': {}
|
|
||||||
}
|
|
||||||
|
|
||||||
# Filter each category
|
|
||||||
for category in ['kst_only_by_title', 'coordi_only_by_title', 'matched_by_title']:
|
|
||||||
for title, items in grouped_data[category].items():
|
|
||||||
filtered_items = [item for item in items if item.get('sheet') == sheet_filter]
|
|
||||||
if filtered_items:
|
|
||||||
filtered[category][title] = filtered_items
|
|
||||||
|
|
||||||
# Recalculate title summaries for filtered data
|
|
||||||
all_titles = set()
|
|
||||||
all_titles.update(filtered['kst_only_by_title'].keys())
|
|
||||||
all_titles.update(filtered['coordi_only_by_title'].keys())
|
|
||||||
all_titles.update(filtered['matched_by_title'].keys())
|
|
||||||
|
|
||||||
for title in all_titles:
|
|
||||||
kst_only_count = len(filtered['kst_only_by_title'].get(title, []))
|
|
||||||
coordi_only_count = len(filtered['coordi_only_by_title'].get(title, []))
|
|
||||||
matched_count = len(filtered['matched_by_title'].get(title, []))
|
|
||||||
total_episodes = kst_only_count + coordi_only_count + matched_count
|
|
||||||
|
|
||||||
filtered['title_summaries'][title] = {
|
|
||||||
'total_episodes': total_episodes,
|
|
||||||
'matched_count': matched_count,
|
|
||||||
'kst_only_count': kst_only_count,
|
|
||||||
'coordi_only_count': coordi_only_count,
|
|
||||||
'match_percentage': round((matched_count / total_episodes * 100) if total_episodes > 0 else 0, 1),
|
|
||||||
'has_mismatches': kst_only_count > 0 or coordi_only_count > 0
|
|
||||||
}
|
|
||||||
|
|
||||||
return filtered
|
|
||||||
|
|
||||||
def calculate_filtered_counts(self, filtered_mismatch_details: Dict[str, List]) -> Dict[str, int]:
|
|
||||||
"""Calculate counts for filtered data"""
|
|
||||||
return {
|
|
||||||
'kst_total': len(filtered_mismatch_details['kst_only']) + len(filtered_mismatch_details['kst_duplicates']),
|
|
||||||
'coordi_total': len(filtered_mismatch_details['coordi_only']) + len(filtered_mismatch_details['coordi_duplicates']),
|
|
||||||
'matched': 0, # Will be calculated from matched data separately
|
|
||||||
'kst_only_count': len(filtered_mismatch_details['kst_only']),
|
|
||||||
'coordi_only_count': len(filtered_mismatch_details['coordi_only']),
|
|
||||||
'kst_duplicates_count': len(filtered_mismatch_details['kst_duplicates']),
|
|
||||||
'coordi_duplicates_count': len(filtered_mismatch_details['coordi_duplicates']),
|
|
||||||
'mixed_duplicates_count': len(filtered_mismatch_details.get('mixed_duplicates', []))
|
|
||||||
}
|
|
||||||
|
|
||||||
def group_by_title(self) -> Dict[str, Any]:
|
|
||||||
"""Group mismatches and matches by KR title"""
|
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
|
|
||||||
grouped = {
|
grouped = {
|
||||||
@ -443,33 +391,38 @@ class KSTCoordiComparator:
|
|||||||
'title_summaries': {}
|
'title_summaries': {}
|
||||||
}
|
}
|
||||||
|
|
||||||
# Get mismatch details
|
|
||||||
mismatch_details = self.generate_mismatch_details()
|
|
||||||
|
|
||||||
# Group KST only items by title
|
# Group KST only items by title
|
||||||
for item in mismatch_details['kst_only']:
|
for item in categorization['kst_only_items']:
|
||||||
title = item['title']
|
title = item.title
|
||||||
grouped['kst_only_by_title'][title].append(item)
|
grouped['kst_only_by_title'][title].append({
|
||||||
|
'title': item.title,
|
||||||
|
'episode': item.episode,
|
||||||
|
'sheet': item.source_sheet,
|
||||||
|
'row_index': item.row_index,
|
||||||
|
'reason': 'Item exists in KST data but not in Coordi data'
|
||||||
|
})
|
||||||
|
|
||||||
# Group Coordi only items by title
|
# Group Coordi only items by title
|
||||||
for item in mismatch_details['coordi_only']:
|
for item in categorization['coordi_only_items']:
|
||||||
title = item['title']
|
title = item.title
|
||||||
grouped['coordi_only_by_title'][title].append(item)
|
grouped['coordi_only_by_title'][title].append({
|
||||||
|
'title': item.title,
|
||||||
|
'episode': item.episode,
|
||||||
|
'sheet': item.source_sheet,
|
||||||
|
'row_index': item.row_index,
|
||||||
|
'reason': 'Item exists in Coordi data but not in KST data'
|
||||||
|
})
|
||||||
|
|
||||||
# Group matched items by title
|
# Group matched items by title
|
||||||
if hasattr(self, 'kst_items') and hasattr(self, 'coordi_items'):
|
for item in categorization['matched_items']:
|
||||||
categorization = self.categorize_mismatches()
|
title = item.title
|
||||||
matched_items = categorization['matched_items']
|
grouped['matched_by_title'][title].append({
|
||||||
|
'title': item.title,
|
||||||
for item in matched_items:
|
'episode': item.episode,
|
||||||
title = item.title
|
'sheet': item.source_sheet,
|
||||||
grouped['matched_by_title'][title].append({
|
'row_index': item.row_index,
|
||||||
'title': item.title,
|
'reason': 'Perfect match'
|
||||||
'episode': item.episode,
|
})
|
||||||
'sheet': item.source_sheet,
|
|
||||||
'row_index': item.row_index,
|
|
||||||
'reason': 'Perfect match'
|
|
||||||
})
|
|
||||||
|
|
||||||
# Create summary for each title
|
# Create summary for each title
|
||||||
all_titles = set()
|
all_titles = set()
|
||||||
@ -499,12 +452,14 @@ class KSTCoordiComparator:
|
|||||||
|
|
||||||
return grouped
|
return grouped
|
||||||
|
|
||||||
def print_comparison_summary(self):
|
|
||||||
"""Print a formatted summary of the comparison"""
|
|
||||||
summary = self.get_comparison_summary()
|
def print_comparison_summary(self, sheet_filter: str = None):
|
||||||
|
"""Print a formatted summary of the comparison for a specific sheet"""
|
||||||
|
summary = self.get_comparison_summary(sheet_filter)
|
||||||
|
|
||||||
print("=" * 80)
|
print("=" * 80)
|
||||||
print("KST vs COORDI COMPARISON SUMMARY")
|
print(f"KST vs COORDI COMPARISON SUMMARY - Sheet: {summary['current_sheet_filter']}")
|
||||||
print("=" * 80)
|
print("=" * 80)
|
||||||
|
|
||||||
print(f"Original Counts:")
|
print(f"Original Counts:")
|
||||||
@ -520,6 +475,7 @@ class KSTCoordiComparator:
|
|||||||
print(f" Coordi Only: {summary['mismatches']['coordi_only_count']}")
|
print(f" Coordi Only: {summary['mismatches']['coordi_only_count']}")
|
||||||
print(f" KST Duplicates: {summary['mismatches']['kst_duplicates_count']}")
|
print(f" KST Duplicates: {summary['mismatches']['kst_duplicates_count']}")
|
||||||
print(f" Coordi Duplicates: {summary['mismatches']['coordi_duplicates_count']}")
|
print(f" Coordi Duplicates: {summary['mismatches']['coordi_duplicates_count']}")
|
||||||
|
print(f" Mixed Duplicates: {summary['mismatches']['mixed_duplicates_count']}")
|
||||||
print()
|
print()
|
||||||
|
|
||||||
print(f"Reconciliation:")
|
print(f"Reconciliation:")
|
||||||
|
|||||||
@ -104,7 +104,17 @@
|
|||||||
}
|
}
|
||||||
.summary-card h3 {
|
.summary-card h3 {
|
||||||
margin-top: 0;
|
margin-top: 0;
|
||||||
|
margin-bottom: 15px;
|
||||||
color: #333;
|
color: #333;
|
||||||
|
font-size: 1.1em;
|
||||||
|
}
|
||||||
|
.summary-card p {
|
||||||
|
margin: 8px 0;
|
||||||
|
color: #555;
|
||||||
|
}
|
||||||
|
.summary-card span {
|
||||||
|
font-weight: bold;
|
||||||
|
color: #007bff;
|
||||||
}
|
}
|
||||||
.count-badge {
|
.count-badge {
|
||||||
display: inline-block;
|
display: inline-block;
|
||||||
@ -196,6 +206,22 @@
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div id="summary" class="tab-content active">
|
<div id="summary" class="tab-content active">
|
||||||
|
<!-- Summary Cards Section -->
|
||||||
|
<div class="summary-grid">
|
||||||
|
<div class="summary-card">
|
||||||
|
<h3>📊 Sheet Summary</h3>
|
||||||
|
<p><strong>Current Sheet:</strong> <span id="current-sheet-name">-</span></p>
|
||||||
|
<p><strong>Matched Items:</strong> <span id="summary-matched-count">0</span> (Same in both KST and Coordi)</p>
|
||||||
|
<p><strong>Different Items:</strong> <span id="summary-different-count">0</span> (Total tasks excluding matched items)</p>
|
||||||
|
</div>
|
||||||
|
<div class="summary-card">
|
||||||
|
<h3>🔍 Breakdown</h3>
|
||||||
|
<p><strong>KST Only:</strong> <span id="summary-kst-only">0</span></p>
|
||||||
|
<p><strong>Coordi Only:</strong> <span id="summary-coordi-only">0</span></p>
|
||||||
|
<p><strong>Duplicates:</strong> <span id="summary-duplicates">0</span></p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
<h3>Matched Items (Same in both KST and Coordi) <span id="matched-count-display" class="count-badge">0</span></h3>
|
<h3>Matched Items (Same in both KST and Coordi) <span id="matched-count-display" class="count-badge">0</span></h3>
|
||||||
<div class="table-container">
|
<div class="table-container">
|
||||||
<table>
|
<table>
|
||||||
@ -411,6 +437,18 @@
|
|||||||
(results.mismatches.mixed_duplicates_count || 0);
|
(results.mismatches.mixed_duplicates_count || 0);
|
||||||
document.getElementById('different-count-display').textContent = totalDifferent.toLocaleString();
|
document.getElementById('different-count-display').textContent = totalDifferent.toLocaleString();
|
||||||
|
|
||||||
|
// Update summary section
|
||||||
|
document.getElementById('current-sheet-name').textContent = results.current_sheet_filter;
|
||||||
|
document.getElementById('summary-matched-count').textContent = results.matched_items_count.toLocaleString();
|
||||||
|
document.getElementById('summary-different-count').textContent = totalDifferent.toLocaleString();
|
||||||
|
document.getElementById('summary-kst-only').textContent = results.mismatches.kst_only_count.toLocaleString();
|
||||||
|
document.getElementById('summary-coordi-only').textContent = results.mismatches.coordi_only_count.toLocaleString();
|
||||||
|
|
||||||
|
// Calculate total duplicates (KST + Coordi + Mixed)
|
||||||
|
const totalDuplicates = results.mismatches.kst_duplicates_count + results.mismatches.coordi_duplicates_count +
|
||||||
|
(results.mismatches.mixed_duplicates_count || 0);
|
||||||
|
document.getElementById('summary-duplicates').textContent = totalDuplicates.toLocaleString();
|
||||||
|
|
||||||
// Update Summary tab (matched items)
|
// Update Summary tab (matched items)
|
||||||
updateSummaryTable(results.matched_data);
|
updateSummaryTable(results.matched_data);
|
||||||
|
|
||||||
|
|||||||
101
test_ba_confirmed_cases.py
Normal file
101
test_ba_confirmed_cases.py
Normal file
@ -0,0 +1,101 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
|
||||||
|
from data_comparator import KSTCoordiComparator
|
||||||
|
|
||||||
|
def test_ba_confirmed_cases():
|
||||||
|
"""Test that the comparison logic matches BA confirmed expectations"""
|
||||||
|
print("Testing BA confirmed duplicate cases...")
|
||||||
|
|
||||||
|
# Create comparator and load data
|
||||||
|
comparator = KSTCoordiComparator("data/sample-data.xlsx")
|
||||||
|
if not comparator.load_data():
|
||||||
|
print("Failed to load data!")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("\n=== US URGENT Sheet - BA Confirmed Cases ===")
|
||||||
|
us_summary = comparator.get_comparison_summary('US URGENT')
|
||||||
|
|
||||||
|
# Check for expected duplicates in US URGENT
|
||||||
|
coordi_duplicates = us_summary['mismatch_details']['coordi_duplicates']
|
||||||
|
mixed_duplicates = us_summary['mismatch_details']['mixed_duplicates']
|
||||||
|
|
||||||
|
expected_coordi_duplicates = [
|
||||||
|
('금수의 영역', '17'),
|
||||||
|
('신결', '23')
|
||||||
|
]
|
||||||
|
|
||||||
|
expected_mixed_duplicates = [
|
||||||
|
('트윈 가이드', '31')
|
||||||
|
]
|
||||||
|
|
||||||
|
print("Coordi duplicates found:")
|
||||||
|
found_coordi = []
|
||||||
|
for item in coordi_duplicates:
|
||||||
|
key = (item['title'], item['episode'])
|
||||||
|
found_coordi.append(key)
|
||||||
|
print(f" - {item['title']} - Episode {item['episode']}")
|
||||||
|
|
||||||
|
print("\nMixed duplicates found:")
|
||||||
|
found_mixed = []
|
||||||
|
for item in mixed_duplicates:
|
||||||
|
key = (item['title'], item['episode'])
|
||||||
|
found_mixed.append(key)
|
||||||
|
print(f" - {item['title']} - Episode {item['episode']} ({item['reason']})")
|
||||||
|
|
||||||
|
# Verify expected cases
|
||||||
|
print("\n✓ Verification:")
|
||||||
|
for expected in expected_coordi_duplicates:
|
||||||
|
if expected in found_coordi:
|
||||||
|
print(f" ✓ Found expected Coordi duplicate: {expected[0]} - Episode {expected[1]}")
|
||||||
|
else:
|
||||||
|
print(f" ✗ Missing expected Coordi duplicate: {expected[0]} - Episode {expected[1]}")
|
||||||
|
|
||||||
|
for expected in expected_mixed_duplicates:
|
||||||
|
if expected in found_mixed:
|
||||||
|
print(f" ✓ Found expected mixed duplicate: {expected[0]} - Episode {expected[1]}")
|
||||||
|
else:
|
||||||
|
print(f" ✗ Missing expected mixed duplicate: {expected[0]} - Episode {expected[1]}")
|
||||||
|
|
||||||
|
print("\n=== TH URGENT Sheet - BA Confirmed Cases ===")
|
||||||
|
th_summary = comparator.get_comparison_summary('TH URGENT')
|
||||||
|
|
||||||
|
# Check for expected duplicates in TH URGENT
|
||||||
|
kst_duplicates = th_summary['mismatch_details']['kst_duplicates']
|
||||||
|
coordi_only = th_summary['mismatch_details']['coordi_only']
|
||||||
|
|
||||||
|
expected_kst_duplicates = [
|
||||||
|
('백라이트', '53-1x(휴재)')
|
||||||
|
]
|
||||||
|
|
||||||
|
print("KST duplicates found:")
|
||||||
|
found_kst = []
|
||||||
|
for item in kst_duplicates:
|
||||||
|
key = (item['title'], item['episode'])
|
||||||
|
found_kst.append(key)
|
||||||
|
print(f" - {item['title']} - Episode {item['episode']}")
|
||||||
|
|
||||||
|
# Check that 백라이트 - Episode 53-1x(휴재) doesn't appear in Coordi
|
||||||
|
print("\nChecking that 백라이트 - Episode 53-1x(휴재) doesn't appear in Coordi:")
|
||||||
|
found_in_coordi = False
|
||||||
|
for item in coordi_only:
|
||||||
|
if item['title'] == '백라이트' and item['episode'] == '53-1x(휴재)':
|
||||||
|
found_in_coordi = True
|
||||||
|
break
|
||||||
|
|
||||||
|
if not found_in_coordi:
|
||||||
|
print(" ✓ 백라이트 - Episode 53-1x(휴재) correctly does NOT appear in Coordi data")
|
||||||
|
else:
|
||||||
|
print(" ✗ 백라이트 - Episode 53-1x(휴재) incorrectly appears in Coordi data")
|
||||||
|
|
||||||
|
# Verify expected cases
|
||||||
|
print("\n✓ Verification:")
|
||||||
|
for expected in expected_kst_duplicates:
|
||||||
|
if expected in found_kst:
|
||||||
|
print(f" ✓ Found expected KST duplicate: {expected[0]} - Episode {expected[1]}")
|
||||||
|
else:
|
||||||
|
print(f" ✗ Missing expected KST duplicate: {expected[0]} - Episode {expected[1]}")
|
||||||
|
|
||||||
|
print("\n✓ All BA confirmed cases tested!")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
test_ba_confirmed_cases()
|
||||||
74
web_gui.py
74
web_gui.py
@ -37,28 +37,20 @@ def analyze_data():
|
|||||||
# Get comparison results with optional sheet filtering
|
# Get comparison results with optional sheet filtering
|
||||||
comparison_results = comparator_instance.get_comparison_summary(sheet_filter)
|
comparison_results = comparator_instance.get_comparison_summary(sheet_filter)
|
||||||
|
|
||||||
# Get matched items for display
|
# Get matched items from the grouped data
|
||||||
categorization = comparator_instance.categorize_mismatches()
|
matched_items_data = []
|
||||||
matched_items = list(categorization['matched_items'])
|
for title, items in comparison_results['grouped_by_title']['matched_by_title'].items():
|
||||||
|
for item in items[:500]: # Limit for performance
|
||||||
# Filter matched items by sheet if specified
|
matched_items_data.append({
|
||||||
if sheet_filter:
|
'title': item['title'],
|
||||||
matched_items = [item for item in matched_items if item.source_sheet == sheet_filter]
|
'episode': item['episode'],
|
||||||
|
'sheet': item['sheet'],
|
||||||
# Format matched items for JSON (limit to first 500 for performance)
|
'row': item['row_index'] + 1 if item['row_index'] is not None else 'N/A',
|
||||||
matched_data = []
|
'reason': 'Perfect match'
|
||||||
for item in matched_items[:500]:
|
})
|
||||||
matched_data.append({
|
|
||||||
'title': item.title,
|
|
||||||
'episode': item.episode,
|
|
||||||
'sheet': item.source_sheet,
|
|
||||||
'row': item.row_index + 1,
|
|
||||||
'reason': 'Perfect match'
|
|
||||||
})
|
|
||||||
|
|
||||||
# Add matched data to results
|
# Add matched data to results
|
||||||
comparison_results['matched_data'] = matched_data
|
comparison_results['matched_data'] = matched_items_data
|
||||||
comparison_results['matched_items_count'] = len(matched_items) # Update count for filtered data
|
|
||||||
|
|
||||||
return jsonify({
|
return jsonify({
|
||||||
'success': True,
|
'success': True,
|
||||||
@ -212,7 +204,17 @@ def create_templates_dir():
|
|||||||
}
|
}
|
||||||
.summary-card h3 {
|
.summary-card h3 {
|
||||||
margin-top: 0;
|
margin-top: 0;
|
||||||
|
margin-bottom: 15px;
|
||||||
color: #333;
|
color: #333;
|
||||||
|
font-size: 1.1em;
|
||||||
|
}
|
||||||
|
.summary-card p {
|
||||||
|
margin: 8px 0;
|
||||||
|
color: #555;
|
||||||
|
}
|
||||||
|
.summary-card span {
|
||||||
|
font-weight: bold;
|
||||||
|
color: #007bff;
|
||||||
}
|
}
|
||||||
.count-badge {
|
.count-badge {
|
||||||
display: inline-block;
|
display: inline-block;
|
||||||
@ -304,6 +306,22 @@ def create_templates_dir():
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div id="summary" class="tab-content active">
|
<div id="summary" class="tab-content active">
|
||||||
|
<!-- Summary Cards Section -->
|
||||||
|
<div class="summary-grid">
|
||||||
|
<div class="summary-card">
|
||||||
|
<h3>📊 Sheet Summary</h3>
|
||||||
|
<p><strong>Current Sheet:</strong> <span id="current-sheet-name">-</span></p>
|
||||||
|
<p><strong>Matched Items:</strong> <span id="summary-matched-count">0</span> (Same in both KST and Coordi)</p>
|
||||||
|
<p><strong>Different Items:</strong> <span id="summary-different-count">0</span> (Total tasks excluding matched items)</p>
|
||||||
|
</div>
|
||||||
|
<div class="summary-card">
|
||||||
|
<h3>🔍 Breakdown</h3>
|
||||||
|
<p><strong>KST Only:</strong> <span id="summary-kst-only">0</span></p>
|
||||||
|
<p><strong>Coordi Only:</strong> <span id="summary-coordi-only">0</span></p>
|
||||||
|
<p><strong>Duplicates:</strong> <span id="summary-duplicates">0</span></p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
<h3>Matched Items (Same in both KST and Coordi) <span id="matched-count-display" class="count-badge">0</span></h3>
|
<h3>Matched Items (Same in both KST and Coordi) <span id="matched-count-display" class="count-badge">0</span></h3>
|
||||||
<div class="table-container">
|
<div class="table-container">
|
||||||
<table>
|
<table>
|
||||||
@ -519,6 +537,18 @@ def create_templates_dir():
|
|||||||
(results.mismatches.mixed_duplicates_count || 0);
|
(results.mismatches.mixed_duplicates_count || 0);
|
||||||
document.getElementById('different-count-display').textContent = totalDifferent.toLocaleString();
|
document.getElementById('different-count-display').textContent = totalDifferent.toLocaleString();
|
||||||
|
|
||||||
|
// Update summary section
|
||||||
|
document.getElementById('current-sheet-name').textContent = results.current_sheet_filter;
|
||||||
|
document.getElementById('summary-matched-count').textContent = results.matched_items_count.toLocaleString();
|
||||||
|
document.getElementById('summary-different-count').textContent = totalDifferent.toLocaleString();
|
||||||
|
document.getElementById('summary-kst-only').textContent = results.mismatches.kst_only_count.toLocaleString();
|
||||||
|
document.getElementById('summary-coordi-only').textContent = results.mismatches.coordi_only_count.toLocaleString();
|
||||||
|
|
||||||
|
// Calculate total duplicates (KST + Coordi + Mixed)
|
||||||
|
const totalDuplicates = results.mismatches.kst_duplicates_count + results.mismatches.coordi_duplicates_count +
|
||||||
|
(results.mismatches.mixed_duplicates_count || 0);
|
||||||
|
document.getElementById('summary-duplicates').textContent = totalDuplicates.toLocaleString();
|
||||||
|
|
||||||
// Update Summary tab (matched items)
|
// Update Summary tab (matched items)
|
||||||
updateSummaryTable(results.matched_data);
|
updateSummaryTable(results.matched_data);
|
||||||
|
|
||||||
@ -659,8 +689,8 @@ def main():
|
|||||||
create_templates_dir()
|
create_templates_dir()
|
||||||
|
|
||||||
print("Starting web-based GUI...")
|
print("Starting web-based GUI...")
|
||||||
print("Open your browser and go to: http://localhost:8081")
|
print("Open your browser and go to: http://localhost:8080")
|
||||||
app.run(debug=True, host='0.0.0.0', port=8081)
|
app.run(debug=True, host='0.0.0.0', port=8080)
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
Loading…
Reference in New Issue
Block a user