Skip to content

Add SQLite-backed registry with JSON migration support#15

Open
kdush wants to merge 31 commits intoVectifyAI:mainfrom
kdush:feat/sqlite-storage-backend
Open

Add SQLite-backed registry with JSON migration support#15
kdush wants to merge 31 commits intoVectifyAI:mainfrom
kdush:feat/sqlite-storage-backend

Conversation

@kdush
Copy link
Copy Markdown

@kdush kdush commented Apr 11, 2026

Summary

Add SQLite-backed registry as the default storage backend, with automatic JSON migration support.

Changes

  • add DbRegistry class with SQLite backend, WAL mode, and JSON migration
  • add storage_backend config option (sqlite | json)
  • update CLI commands to use get_registry() factory
  • document storage backend options and migration behavior
  • add tests for SQLite registry, migration, and backend selection

Testing

  • 25 new tests passed for SQLite backend and migration
  • all existing tests pass
  • end-to-end verified: init, add, query, status, list
  • confirmed SQLite creates hashes.db, hashes.db-wal, hashes.db-shm

Backward Compatibility

  • storage_backend: json still works for existing setups
  • automatic migration from hashes.json to hashes.db when switching to SQLite
  • hashes.json preserved after migration for safety

KylinMountain and others added 30 commits April 10, 2026 08:04
…klinks

- Add concept dedup with briefs and _read_concept_briefs context
- Add concepts plan and update prompt templates with create/update/related paths
- Extract shared _compile_concepts from compile_short_doc and compile_long_doc
- Add bidirectional backlinks between summaries and concepts
- Code review fixes: security, robustness, tests, and CI hardening

Co-authored-by: Ray <mailtangyu@gmail.com>
- Add get_page_content tool and parse_pages helper for page-level access
- Store long doc sources as per-page JSON extracted by pymupdf
- Unify summary frontmatter to doc_type + full_text fields
- Update schema and tree renderer for new frontmatter format
- All image paths use sources/images/ prefix relative to wiki root

Co-authored-by: Ray <mailtangyu@gmail.com>
- Change default model to gpt-5.4-mini
- Warn when no LLM API key found instead of failing silently
- Fix CI publish workflow and test isolation

Co-authored-by: Ray <mailtangyu@gmail.com>
- Move warning suppression after imports to avoid markitdown override
- Improve init prompts with explicit defaults
- Use American English throughout (initialized, normalized, Synthesize)
- Replace unicode ellipsis with ASCII
- Remove empty explorations/reports dirs from init
- Fix test isolation for _find_kb_dir
- Add get_image tool for viewing images referenced in source documents
- Use ToolOutputImage for proper image content in LLM context
- Update prompt: use full_text field, restrict get_page_content to pageindex
- Add self-talk before tool calls, enforce concise answers
- Prevent duplicate frontmatter in LLM-generated content via schema update
- Add convert_pdf_to_pages for per-page content+image extraction
- All image paths use sources/images/ prefix relative to wiki root
- Remove page marker comments from short doc source markdown
The _CONCEPT_UPDATE_USER prompt asks the LLM for a full rewrite, but
_write_concept was appending the rewrite to the existing body, causing
content duplication on every concept update.
Replace hand-rolled fence stripping with json_repair to handle
malformed JSON, missing fences, and prose-wrapped responses from LLMs.
Also fixes str.index() ValueError on fenced blocks without newlines.
feat: compile pipeline, query agent, and multimodal improvements
…-fixes

fix: compiler concept update bugs
Drop the language and pageindex_threshold prompts from `openkb init`;
both fall back to config defaults and can be edited later in
`.openkb/config.yaml`. In their place, add an interactive API key
prompt that writes `LLM_API_KEY` to `./.env` (chmod 0600) when the
user provides one, so first-time setup no longer requires a separate
manual step. Also polish the model prompt with provider examples and
a link to LiteLLM for others.
Simplify init prompts and capture API key to .env
When PAGEINDEX_API_KEY is set, index_long_document now fetches
per-page markdown via col.get_page_content() instead of running
local pymupdf. Cloud OCR produces cleaner output (preserves
tables, math, and section headers) than raw pymupdf text
extraction. Falls back to local pymupdf if the cloud call raises
or returns an empty result.
Picks up the cloud add_document poll fix from VectifyAI/PageIndex#226,
which switches the readiness signal from retrieval_ready to
status == "completed".
Move warnings.filterwarnings("ignore") to before the module imports
so pydub's missing-ffmpeg RuntimeWarning, emitted when markitdown
pulls it in, is suppressed. The existing post-import call is kept
because markitdown clobbers the filter state during its own import.
Cloud OCR indexing, pageindex dev1 bump, warning cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants