Add SQLite-backed registry with JSON migration support by kdush · Pull Request #15 · VectifyAI/OpenKB

kdush · 2026-04-11T04:39:37Z

Summary

Add SQLite-backed registry as the default storage backend, with automatic JSON migration support.

Changes

add DbRegistry class with SQLite backend, WAL mode, and JSON migration
add storage_backend config option (sqlite | json)
update CLI commands to use get_registry() factory
document storage backend options and migration behavior
add tests for SQLite registry, migration, and backend selection

Testing

25 new tests passed for SQLite backend and migration
all existing tests pass
end-to-end verified: init, add, query, status, list
confirmed SQLite creates hashes.db, hashes.db-wal, hashes.db-shm

Backward Compatibility

storage_backend: json still works for existing setups
automatic migration from hashes.json to hashes.db when switching to SQLite
hashes.json preserved after migration for safety

…klinks - Add concept dedup with briefs and _read_concept_briefs context - Add concepts plan and update prompt templates with create/update/related paths - Extract shared _compile_concepts from compile_short_doc and compile_long_doc - Add bidirectional backlinks between summaries and concepts - Code review fixes: security, robustness, tests, and CI hardening Co-authored-by: Ray <mailtangyu@gmail.com>

- Add get_page_content tool and parse_pages helper for page-level access - Store long doc sources as per-page JSON extracted by pymupdf - Unify summary frontmatter to doc_type + full_text fields - Update schema and tree renderer for new frontmatter format - All image paths use sources/images/ prefix relative to wiki root Co-authored-by: Ray <mailtangyu@gmail.com>

- Change default model to gpt-5.4-mini - Warn when no LLM API key found instead of failing silently - Fix CI publish workflow and test isolation Co-authored-by: Ray <mailtangyu@gmail.com>

- Move warning suppression after imports to avoid markitdown override - Improve init prompts with explicit defaults - Use American English throughout (initialized, normalized, Synthesize) - Replace unicode ellipsis with ASCII - Remove empty explorations/reports dirs from init - Fix test isolation for _find_kb_dir

- Add get_image tool for viewing images referenced in source documents - Use ToolOutputImage for proper image content in LLM context - Update prompt: use full_text field, restrict get_page_content to pageindex - Add self-talk before tool calls, enforce concise answers - Prevent duplicate frontmatter in LLM-generated content via schema update

- Add convert_pdf_to_pages for per-page content+image extraction - All image paths use sources/images/ prefix relative to wiki root - Remove page marker comments from short doc source markdown

The _CONCEPT_UPDATE_USER prompt asks the LLM for a full rewrite, but _write_concept was appending the rewrite to the existing body, causing content duplication on every concept update.

Replace hand-rolled fence stripping with json_repair to handle malformed JSON, missing fences, and prose-wrapped responses from LLMs. Also fixes str.index() ValueError on fenced blocks without newlines.

…ocstring

feat: compile pipeline, query agent, and multimodal improvements

This reverts commit 3e3d56f.

…-fixes fix: compiler concept update bugs

Release: merge dev into main

Drop the language and pageindex_threshold prompts from `openkb init`; both fall back to config defaults and can be edited later in `.openkb/config.yaml`. In their place, add an interactive API key prompt that writes `LLM_API_KEY` to `./.env` (chmod 0600) when the user provides one, so first-time setup no longer requires a separate manual step. Also polish the model prompt with provider examples and a link to LiteLLM for others.

Simplify init prompts and capture API key to .env

When PAGEINDEX_API_KEY is set, index_long_document now fetches per-page markdown via col.get_page_content() instead of running local pymupdf. Cloud OCR produces cleaner output (preserves tables, math, and section headers) than raw pymupdf text extraction. Falls back to local pymupdf if the cloud call raises or returns an empty result.

Picks up the cloud add_document poll fix from VectifyAI/PageIndex#226, which switches the readiness signal from retrieval_ready to status == "completed".

Move warnings.filterwarnings("ignore") to before the module imports so pydub's missing-ffmpeg RuntimeWarning, emitted when markitdown pulls it in, is suppressed. The existing post-import call is kept because markitdown clobbers the filter state during its own import.

Cloud OCR indexing, pageindex dev1 bump, warning cleanup

KylinMountain and others added 30 commits April 10, 2026 08:04

fix: default model, API key warning, config and CI improvements

0bf7084

- Change default model to gpt-5.4-mini - Warn when no LLM API key found instead of failing silently - Fix CI publish workflow and test isolation Co-authored-by: Ray <mailtangyu@gmail.com>

refactor: unify image paths and add pymupdf per-page extraction

44bf83e

- Add convert_pdf_to_pages for per-page content+image extraction - All image paths use sources/images/ prefix relative to wiki root - Remove page marker comments from short doc source markdown

fix: replace concept body on update instead of appending

7ca95f9

The _CONCEPT_UPDATE_USER prompt asks the LLM for a full rewrite, but _write_concept was appending the rewrite to the existing body, causing content duplication on every concept update.

fix: use json_repair for robust LLM JSON parsing

d41588a

Replace hand-rolled fence stripping with json_repair to handle malformed JSON, missing fences, and prose-wrapped responses from LLMs. Also fixes str.index() ValueError on fenced blocks without newlines.

fix: use pdf_path.stem for full_text frontmatter path

7dd70c6

fix: sanitize concept names before links and index

b90f0b4

fix: pass doc_type and doc_brief in early return paths

3dd84f3

fix: sanitize concept name in _gen_update and correct _update_index d…

ef60f7d

…ocstring

Merge pull request VectifyAI#10 from VectifyAI/bugfix/compile-clean

85eaebf

feat: compile pipeline, query agent, and multimodal improvements

fix: update existing concept briefs in index.md instead of skipping

8818ada

fix: preserve non-ASCII characters in concept name slugs

aabcf5f

fix: always replace concept body on update, not only when source is new

9df6e6c

Fix concept index updates by section

ef235d2

Fix exact concept index row matching

3e3d56f

Fix exact concept index row matching

ed0d6ba

Revert "Fix exact concept index row matching"

b6f6ba3

This reverts commit 3e3d56f.

Merge pull request VectifyAI#11 from VectifyAI/bugfix/compiler-update…

2a15587

…-fixes fix: compiler concept update bugs

Merge pull request VectifyAI#12 from VectifyAI/dev

0291ec9

Release: merge dev into main

Merge pull request VectifyAI#13 from VectifyAI/feat/init-api-key-prompt

771452d

Simplify init prompts and capture API key to .env

Bump pageindex to 0.3.0.dev1

e0ab3f9

Picks up the cloud add_document poll fix from VectifyAI/PageIndex#226, which switches the readiness signal from retrieval_ready to status == "completed".

Merge pull request VectifyAI#14 from VectifyAI/dev

2e1caf9

Cloud OCR indexing, pageindex dev1 bump, warning cleanup

feat: add SQLite-backed registry

fde9b6d

feat: add SQLite backend and migration tests

6dad765

docs: document storage backend and migration

9436ad6

kdush force-pushed the feat/sqlite-storage-backend branch from a56ee15 to 9436ad6 Compare April 11, 2026 04:50

rejojer force-pushed the main branch from 726336a to 8658d1c Compare April 11, 2026 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SQLite-backed registry with JSON migration support#15

Add SQLite-backed registry with JSON migration support#15
kdush wants to merge 31 commits intoVectifyAI:mainfrom
kdush:feat/sqlite-storage-backend

kdush commented Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kdush commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Backward Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kdush commented Apr 11, 2026 •

edited

Loading