Add SQLite-backed registry with JSON migration support#15
Open
kdush wants to merge 31 commits intoVectifyAI:mainfrom
Open
Add SQLite-backed registry with JSON migration support#15kdush wants to merge 31 commits intoVectifyAI:mainfrom
kdush wants to merge 31 commits intoVectifyAI:mainfrom
Conversation
…klinks - Add concept dedup with briefs and _read_concept_briefs context - Add concepts plan and update prompt templates with create/update/related paths - Extract shared _compile_concepts from compile_short_doc and compile_long_doc - Add bidirectional backlinks between summaries and concepts - Code review fixes: security, robustness, tests, and CI hardening Co-authored-by: Ray <mailtangyu@gmail.com>
- Add get_page_content tool and parse_pages helper for page-level access - Store long doc sources as per-page JSON extracted by pymupdf - Unify summary frontmatter to doc_type + full_text fields - Update schema and tree renderer for new frontmatter format - All image paths use sources/images/ prefix relative to wiki root Co-authored-by: Ray <mailtangyu@gmail.com>
- Change default model to gpt-5.4-mini - Warn when no LLM API key found instead of failing silently - Fix CI publish workflow and test isolation Co-authored-by: Ray <mailtangyu@gmail.com>
- Move warning suppression after imports to avoid markitdown override - Improve init prompts with explicit defaults - Use American English throughout (initialized, normalized, Synthesize) - Replace unicode ellipsis with ASCII - Remove empty explorations/reports dirs from init - Fix test isolation for _find_kb_dir
- Add get_image tool for viewing images referenced in source documents - Use ToolOutputImage for proper image content in LLM context - Update prompt: use full_text field, restrict get_page_content to pageindex - Add self-talk before tool calls, enforce concise answers - Prevent duplicate frontmatter in LLM-generated content via schema update
- Add convert_pdf_to_pages for per-page content+image extraction - All image paths use sources/images/ prefix relative to wiki root - Remove page marker comments from short doc source markdown
The _CONCEPT_UPDATE_USER prompt asks the LLM for a full rewrite, but _write_concept was appending the rewrite to the existing body, causing content duplication on every concept update.
Replace hand-rolled fence stripping with json_repair to handle malformed JSON, missing fences, and prose-wrapped responses from LLMs. Also fixes str.index() ValueError on fenced blocks without newlines.
feat: compile pipeline, query agent, and multimodal improvements
This reverts commit 3e3d56f.
…-fixes fix: compiler concept update bugs
Release: merge dev into main
Drop the language and pageindex_threshold prompts from `openkb init`; both fall back to config defaults and can be edited later in `.openkb/config.yaml`. In their place, add an interactive API key prompt that writes `LLM_API_KEY` to `./.env` (chmod 0600) when the user provides one, so first-time setup no longer requires a separate manual step. Also polish the model prompt with provider examples and a link to LiteLLM for others.
Simplify init prompts and capture API key to .env
When PAGEINDEX_API_KEY is set, index_long_document now fetches per-page markdown via col.get_page_content() instead of running local pymupdf. Cloud OCR produces cleaner output (preserves tables, math, and section headers) than raw pymupdf text extraction. Falls back to local pymupdf if the cloud call raises or returns an empty result.
Picks up the cloud add_document poll fix from VectifyAI/PageIndex#226, which switches the readiness signal from retrieval_ready to status == "completed".
Move warnings.filterwarnings("ignore") to before the module imports
so pydub's missing-ffmpeg RuntimeWarning, emitted when markitdown
pulls it in, is suppressed. The existing post-import call is kept
because markitdown clobbers the filter state during its own import.
Cloud OCR indexing, pageindex dev1 bump, warning cleanup
a56ee15 to
9436ad6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add SQLite-backed registry as the default storage backend, with automatic JSON migration support.
Changes
DbRegistryclass with SQLite backend, WAL mode, and JSON migrationstorage_backendconfig option (sqlite | json)get_registry()factoryTesting
hashes.db,hashes.db-wal,hashes.db-shmBackward Compatibility
storage_backend: jsonstill works for existing setupshashes.jsontohashes.dbwhen switching to SQLitehashes.jsonpreserved after migration for safety