Skip to content

feat(similarity): 相似度检测管理前端#61

Open
CodFrm wants to merge 25 commits intomainfrom
test/hotfix
Open

feat(similarity): 相似度检测管理前端#61
CodFrm wants to merge 25 commits intomainfrom
test/hotfix

Conversation

@CodFrm
Copy link
Copy Markdown
Member

@CodFrm CodFrm commented Apr 16, 2026

No description provided.

CodFrm added 21 commits April 13, 2026 14:49
Add DiffOutlined menu entry and active-key detection for /admin/similarity.
Create similarity page, SimilarityDashboardClient tabs shell, and five
placeholder tab components (PairsTable, SuspectsTable, IntegrityReviewTable,
IntegrityWhitelistTable, PairWhitelistTable).
Typed service class wrapping all admin + public similarity endpoints:
pairs list/detail, suspects list, pair whitelist, integrity reviews,
integrity whitelist, and evidence pair detail.
Merge admin.navigation.similarity, admin.similarity (tabs, columns,
status labels, action labels, drawer/modal strings), similarity.evidence
(disclaimer), and errors.integrity_rejected into the zh-CN locale file.
…orts

- integrity_score === 0 now renders as 0.00 instead of -
- merge split antd imports in SuspectsTable
Implements Task 17 — pair detail page under admin/similarity/pairs/[pairId]
with metadata descriptions, whitelist action button, and a CodeDiffViewer
component that uses @monaco-editor/react DiffEditor with per-segment line
decorations via createDecorationsCollection.
Implements Task 18 — public-facing evidence page at similarity/pair/[id]
that wraps PairDetailClient with a warning Alert disclaimer banner, sourced
from getEvidencePair API endpoint.
…yte→line conversion

- replace Tailwind class names (which Tailwind purges from non-JSX strings)
  with similarity-match-highlight defined in globals.css
- convert UTF-8 byte offsets via TextEncoder to handle non-ASCII source code
  correctly, instead of iterating JS UTF-16 code units
- en-US: proper English translations
- zh-TW: Traditional Chinese translations
- ja-JP / de-DE / ru-RU: English stopgap (pending Crowdin)
Adds the admin UI for §8.5 bootstrap operations:

- BackfillControl component in a new "回填与重扫" dashboard tab,
  polling /admin/similarity/backfill/status every 5s while running.
  Shows total/cursor/progress bar/started_at/finished_at in a
  Descriptions panel with start + restart-from-zero (§8.5 step 9) +
  refresh buttons gated by running flag.
- Manual per-script rescan card with script ID input.
- Stop-fingerprint refresh card with warning copy ("通常不需要手动
  触发"), invoking POST /admin/similarity/stop-fp/refresh — used at
  §8.5 step 8 after the first full backfill completes.
- similarityService adds triggerBackfill / getBackfillStatus /
  manualScan / refreshStopFp methods and BackfillStatus type.
- New admin.similarity.tab_backfill + admin.similarity.backfill.*
  translation keys in zh-CN.
Backend now marks each ScriptBrief with is_deleted and accepts an
exclude_deleted query param. Render deleted scripts with a strikethrough
link and red tag, and add a Switch above the table that lets admins hide
any pair whose either side has been soft-deleted.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the frontend surface for the “similarity detection / integrity review” system, including an admin dashboard (pairs/suspects/reviews/whitelists/backfill) and a public “evidence” view for a similarity pair.

Changes:

  • Introduces similarityService API client + related response types for similarity/pair/integrity/backfill endpoints.
  • Adds new admin pages/components for viewing pairs/suspects, reviewing integrity alerts, managing whitelists, and triggering backfill/manual scans.
  • Adds a public evidence page, integrity-rejection UX in the script editor, and supporting i18n + styling.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/lib/api/services/similarity.ts Adds similarity/integrity/backfill API service and typed models.
src/lib/api/errorCodes.ts Adds an error-code constant used for integrity rejection handling.
src/components/ScriptEditor/index.tsx Shows a dedicated integrity-rejection alert when submit fails with a specific error code.
src/components/IntegrityErrorAlert/IntegrityErrorAlert.tsx New component to display integrity rejection details + help text.
src/app/globals.css Adds global highlight style for similarity match segments (Monaco decorations).
src/app/[locale]/(main)/similarity/pair/[id]/page.tsx Adds public evidence route for a similarity pair.
src/app/[locale]/(main)/similarity/pair/[id]/components/EvidencePageClient.tsx Renders disclaimer + reuses pair detail UI for evidence mode.
src/app/[locale]/(main)/admin/similarity/page.tsx Adds admin similarity dashboard route.
src/app/[locale]/(main)/admin/similarity/components/SimilarityDashboardClient.tsx Tabs container for all similarity admin tools.
src/app/[locale]/(main)/admin/similarity/components/PairsTable.tsx Admin table for similarity pairs + filter for deleted scripts.
src/app/[locale]/(main)/admin/similarity/components/SuspectsTable.tsx Admin table for suspect scripts + expandable “top sources”.
src/app/[locale]/(main)/admin/similarity/components/IntegrityReviewTable.tsx Admin table + detail drawer + resolve flow for integrity reviews.
src/app/[locale]/(main)/admin/similarity/components/ResolveReviewModal.tsx Modal to resolve an integrity review.
src/app/[locale]/(main)/admin/similarity/components/IntegrityWhitelistTable.tsx Admin CRUD UI for integrity exemptions/whitelist.
src/app/[locale]/(main)/admin/similarity/components/PairWhitelistTable.tsx Admin table to remove pair whitelist entries.
src/app/[locale]/(main)/admin/similarity/components/BackfillControl.tsx Admin controls for backfill, manual scan, and stop-fp refresh.
src/app/[locale]/(main)/admin/similarity/pairs/[pairId]/page.tsx Adds admin route for pair detail page.
src/app/[locale]/(main)/admin/similarity/pairs/[pairId]/components/PairDetailClient.tsx Fetches pair detail and renders metadata + diff + whitelist action.
src/app/[locale]/(main)/admin/similarity/pairs/[pairId]/components/CodeDiffViewer.tsx Monaco diff + match segment highlighting.
src/app/[locale]/(main)/admin/components/AdminLayout.tsx Adds “Similarity” entry to the admin navigation menu.
public/locales/zh-CN/translations.json Adds zh-CN strings for similarity admin UI + evidence + integrity error.
public/locales/zh-TW/translations.json Adds zh-TW strings for similarity admin UI + evidence + integrity error (partial).
public/locales/en-US/translations.json Adds en-US strings for similarity admin UI + evidence + integrity error (partial).
public/locales/ru-RU/translations.json Adds ru-RU strings for similarity admin UI + evidence + integrity error (partial).
public/locales/ja-JP/translations.json Adds ja-JP strings for similarity admin UI + evidence + integrity error (partial).
public/locales/de-DE/translations.json Adds de-DE strings for similarity admin UI + evidence + integrity error (partial).
.gitignore Ignores .omc.

title: t('confirm_remove_whitelist'),
onOk: async () => {
try {
await similarityService.removePairWhitelistByID(row.id);
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call site uses removePairWhitelistByID(); if the service method is renamed to consistent casing (e.g., removePairWhitelistById), update this usage accordingly to avoid churn and keep the API surface consistent.

Suggested change
await similarityService.removePairWhitelistByID(row.id);
await similarityService.removePairWhitelistById(row.id);

Copilot uses AI. Check for mistakes.
Comment on lines +255 to +259
"similarity": {
"tab_pairs": "Pairs",
"tab_suspects": "Suspects",
"tab_integrity_reviews": "Integrity Reviews",
"tab_pair_whitelist": "Pair Whitelist",
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New similarity UI references additional translation keys (e.g. tab_backfill, backfill.*, script_deleted, filter_exclude_deleted) that are not present in this locale file. Add these keys here to avoid missing-message fallbacks at runtime.

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +220
"similarity": {
"tab_pairs": "Pairs",
"tab_suspects": "Suspects",
"tab_integrity_reviews": "Integrity Reviews",
"tab_pair_whitelist": "Pair Whitelist",
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New similarity UI references additional translation keys (e.g. tab_backfill, backfill.*, script_deleted, filter_exclude_deleted) that are not present in this locale file. Add these keys here to avoid missing-message fallbacks at runtime.

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +220
"similarity": {
"tab_pairs": "Pairs",
"tab_suspects": "Suspects",
"tab_integrity_reviews": "Integrity Reviews",
"tab_pair_whitelist": "Pair Whitelist",
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New similarity UI references additional translation keys (e.g. tab_backfill, backfill.*, script_deleted, filter_exclude_deleted) that are not present in this locale file. Add these keys here to avoid missing-message fallbacks at runtime.

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +22
const SIGNAL_DESCRIPTIONS: Record<string, string> = {
avg_line_length: '平均行长度过长(代码可能被压缩为少量长行)',
max_line_length: '最大行长度过长(存在超长代码行)',
whitespace_ratio: '空白字符比例过低(代码缺少正常的空格和缩进)',
comment_ratio: '注释比例过低(代码几乎没有注释)',
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGNAL_DESCRIPTIONS is hard-coded in zh-CN inside the component, but the rest of this admin UI is localized via next-intl. Consider moving these descriptions into the locale JSON (or providing per-locale strings) so non-Chinese locales don’t see mixed-language UI.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +50
try {
await similarityService.addPairWhitelist(pairID, 'admin whitelist');
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitelisting currently always posts the hard-coded reason "admin whitelist". This makes the whitelist audit trail low-signal and non-localized. Consider prompting the admin for a reason (or at least using a translated default) and pass that value to the API.

Suggested change
try {
await similarityService.addPairWhitelist(pairID, 'admin whitelist');
const input = window.prompt('请输入加入白名单的原因');
if (input === null) return;
const reason = input.trim();
if (!reason) {
message.error('请输入加入白名单的原因');
return;
}
try {
await similarityService.addPairWhitelist(pairID, reason);

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +220
"similarity": {
"tab_pairs": "Pairs",
"tab_suspects": "Suspects",
"tab_integrity_reviews": "Integrity Reviews",
"tab_pair_whitelist": "Pair Whitelist",
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New similarity UI references additional translation keys (e.g. tab_backfill, backfill.*, script_deleted, filter_exclude_deleted) that are not present in this locale file. Add these keys here to avoid missing-message fallbacks at runtime.

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +220
"similarity": {
"tab_pairs": "相似對",
"tab_suspects": "嫌疑腳本",
"tab_integrity_reviews": "完整性警告",
"tab_pair_whitelist": "相似對白名單",
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New similarity UI references additional translation keys (e.g. tab_backfill, backfill.*, script_deleted, filter_exclude_deleted) that are not present in this locale file. Add these keys here to avoid missing-message fallbacks at runtime.

Copilot uses AI. Check for mistakes.
Comment on lines +175 to +176
removePairWhitelistByID(whitelistID: number) {
return apiClient.delete<void>(`${this.adminBase}/whitelist/${whitelistID}`);
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method name uses "ByID"/"whitelistID" which is inconsistent with the rest of the codebase's Id/ID casing and makes call sites noisier. Consider renaming to removePairWhitelistById(whitelistId) (and updating callers) for consistency.

Suggested change
removePairWhitelistByID(whitelistID: number) {
return apiClient.delete<void>(`${this.adminBase}/whitelist/${whitelistID}`);
removePairWhitelistById(whitelistId: number) {
return apiClient.delete<void>(`${this.adminBase}/whitelist/${whitelistId}`);

Copilot uses AI. Check for mistakes.
@CodFrm
Copy link
Copy Markdown
Member Author

CodFrm commented Apr 16, 2026

Code review

Reviewed the diff and also evaluated Copilot's 9 review comments. Found 2 issues that align with Copilot's observations:

  1. SIGNAL_DESCRIPTIONS is hardcoded in Chinese, bypassing the i18n system (CLAUDE.md says "prefer useTranslations() / getTranslations() for user-facing text")

const SIGNAL_DESCRIPTIONS: Record<string, string> = {
avg_line_length: '平均行长度过长(代码可能被压缩为少量长行)',
max_line_length: '最大行长度过长(存在超长代码行)',
whitespace_ratio: '空白字符比例过低(代码缺少正常的空格和缩进)',
comment_ratio: '注释比例过低(代码几乎没有注释)',
single_char_ident_ratio: '单字符变量名比例过高(变量名被缩短为单个字符)',
hex_ident_ratio: '十六进制变量名比例过高(使用了 _0x 开头的混淆变量名)',
large_string_array: '检测到大型字符串数组(常见于混淆工具的字符串表)',
dean_edwards_packer: '检测到 Dean Edwards 打包器',
aa_encode: '检测到 AAEncode 编码',
jj_encode: '检测到 JJEncode 编码',
eval_density: 'eval/动态执行调用密度过高',
};

The component already uses useTranslations('admin.similarity') for all other strings, but these 11 signal descriptions are hardcoded Chinese that will display regardless of locale. They should be moved to the translation files.

  1. 31 translation keys (tab_backfill, backfill.*, script_deleted, filter_exclude_deleted) exist in zh-CN but are missing from all 5 other locale files (en-US, de-DE, ja-JP, ru-RU, zh-TW). Phase 3 keys were manually added to all locales in commit 3dc4eb8, but Phase 4 keys were not given the same treatment.

"script_deleted": "已删除",
"filter_exclude_deleted": "隐藏含已删除脚本的对",

"tab_backfill": "回填与重扫",
"backfill": {
"help_title": "历史脚本回填",
"help_body": "系统上线后,只有新发布或更新的脚本会自动进入相似度扫描。要让历史脚本也参与比对,需要手动触发一次回填。回填会为每个脚本投递一条扫描消息,由后台 consumer 异步处理,过程中可安全离开此页面。",
"status_title": "回填状态",
"label_running": "状态",
"label_total": "总数",
"label_cursor": "进度",
"label_progress": "完成度",
"label_started_at": "开始时间",
"label_finished_at": "结束时间",
"state_running": "进行中",
"state_idle": "空闲",
"btn_start": "启动回填",
"btn_restart": "从头回填",
"btn_refresh": "刷新状态",
"confirm_start_title": "确认启动回填?",
"confirm_start_body": "将从上次暂停的游标继续投递扫描消息。",
"confirm_restart_title": "确认从头回填?",
"confirm_restart_body": "游标会重置为 0,全库脚本都会被重新扫描一次。通常只在首次上线或 stop-fp 列表刷新后需要这么做。",
"msg_started": "回填任务已启动",
"manual_scan_title": "手动重扫单个脚本",
"manual_scan_placeholder": "输入脚本 ID",
"btn_manual_scan": "发送扫描",
"msg_manual_scan_published": "扫描消息已投递",
"stop_fp_title": "Stop-fingerprint 刷新",
"stop_fp_warn_title": "通常不需要手动触发",
"stop_fp_warn_body": "Stop-fp 列表每小时由定时任务自动刷新。仅在首次上线完成全库回填后(§8.5 第 8 步)手动触发一次,使 Jaccard 计算过滤掉公共模板代码。",
"btn_stop_fp_refresh": "立即刷新",
"msg_stop_fp_refreshed": "Stop-fingerprint 集合已刷新"
}
}
},

These keys are used by BackfillControl.tsx, SimilarityDashboardClient.tsx, and PairsTable.tsx. Non-Chinese admins will see blank/missing text in the Backfill tab and the deleted-script filter.


Copilot comment assessment:

  • Hardcoded Chinese SIGNAL_DESCRIPTIONS -- agree, see issue 1 above.
  • Missing translation keys in 5 locales -- agree, see issue 2 above.
  • Hardcoded 'admin whitelist' reason in PairDetailClient.tsx -- valid UX suggestion (inconsistent with IntegrityWhitelistTable which has a reason form), but not a bug.
  • removePairWhitelistByID casing inconsistency -- minor naming style issue, not a bug.
  • removeIntegrityWhitelist(row.script.id) -- not a bug. The addIntegrityWhitelist method also uses script_id, confirming the backend indexes the integrity whitelist by script ID (unlike pair whitelist which uses entry ID).

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

CodFrm added 4 commits April 16, 2026 16:09
…ngs to i18n

Replace the hardcoded SIGNAL_DESCRIPTIONS constant in IntegrityReviewTable with
next-intl t() calls under admin.similarity.signal_desc.*, adding translations
for all 6 locales (zh-CN, en-US, de-DE, ja-JP, ru-RU, zh-TW).
Backfills script_deleted, filter_exclude_deleted, tab_backfill, and the
full backfill sub-object (28 keys) into en-US, de-DE, ja-JP, ru-RU, and
zh-TW — inserted before signal_desc to maintain consistent key ordering.
…nents

Move these components from admin route directory to src/components/similarity/
so both the admin detail page and public evidence page import from the same
shared location, eliminating the cross-layer dependency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants