Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Unable to index local files #1105

Open
4 of 14 tasks
ashgillman opened this issue Feb 10, 2025 · 2 comments
Open
4 of 14 tasks

[FIX] Unable to index local files #1105

ashgillman opened this issue Feb 10, 2025 · 2 comments
Labels
fix Fix something that isn't working as expected

Comments

@ashgillman
Copy link

Describe the bug

Thank you very much in advance for your help. I'm looking forward to having a client to interact with my notes once I have Khoj set up.

I'm unable to sync my files into Khoj. I've managed to get Khoj running, offline, with Ollama. However, I'm unable to sync any files. My files are predominantly .org files.

Desktop

With the Desktop Application, I've selected a number of both files and folders. It says "Connected to Server". But when I hit "force sync", I'm unable to see anything changing in the docker-compose window output.
Desktop Config:

Emacs

With the Emacs client, I can run a force update, and I will get a series of output. However, it ends like this:

Batches:   0%|          | 0/3 [00:00<?, ?it/s]
...

...
Identify new entries: 100%|██████████| 22/22 [00:00<00:00, 1919.35it/s]
server-1    | [05:41:38.247971] DEBUG    khoj.processor.content.org_mode.org_to helpers.py:195
server-1    |                            _entries: Identified entries to add to
server-1    |                            database in: 0.012 seconds
server-1    |
server-1    |
server-1    | [05:41:38.235812] DEBUG    khoj.processor.content.org_mode.org_to helpers.py:195
server-1    |                            _entries: Cleared existing dataset for
server-1    |                            regeneration in: 0.005 seconds
server-1    | ntries:   7%|▋         | 2/30 [00:00<00:03,  7.85it/s]
server-1    | ntries:   0%|          | 0/22 [00:00<?, ?it/s]
server-1    |
server-1    |
server-1    |
server-1    |
Hashing Entries: 100%|██████████| 75/75 [00:00<00:00, 455242.84it/s]
server-1    | [05:41:39.717610] DEBUG    khoj.processor.content.org_mode.org_to helpers.py:195
server-1    |                            _entries: Constructed current entry
server-1    |                            hashes in: 0.001 seconds
server-1    | [05:41:39.718678] DEBUG    khoj.processor.content.org_mod text_to_entries.py:137
server-1    |                            e.org_to_entries: Deleting all
server-1    |                            entries for file type org
server-1    | [05:41:39.726209] DEBUG    khoj.processor.content.org_mode.org_to helpers.py:195
server-1    |                            _entries: Cleared existing dataset for
server-1    |                            regeneration in: 0.008 seconds
Identify new entries: 100%|██████████| 27/27 [00:00<00:00, 1150.81it/s]onds
server-1    | [05:41:39.751274] DEBUG    khoj.processor.content.org_mode.org_to helpers.py:195
server-1    |                            _entries: Identified entries to add to
server-1    |                            database in: 0.024 seconds
server-1    |           | 0/1 [00:00<?, ?it/s]
server-1    | s:   0%|          | 0/75 [00:00<?, ?it/s]
server-1    |
server-1    |
server-1    |
server-1    |
server-1    |
server-1    |
server-1    |
server-1    |
server-1    |
server-1    | ntries:   0%|          | 0/27 [00:00<?, ?it/s]
server-1    |
server-1    |
server-1    |
server-1    | [05:41:39.780873] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API
server-1    |                            call by emacs client
server-1    | [05:41:40.413095] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API
server-1    |                            call by emacs client
server-1    | [05:41:41.093741] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API
server-1    |                            call by emacs client
server-1    | [05:41:42.037019] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API
server-1    |                            call by emacs client
server-1    | [05:41:43.448485] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API
server-1    |                            call by emacs client
server-1    | [05:41:44.063888] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API

So it runs to here:

logger.info(f"📬 Updating content index via API call by {client} client")

But I never get the success log:
logger.info(f"Finished {method} {t} data sent by {client} client into content index")

Emacs config:

(setq        khoj-server-url "http://127.0.0.1:42110"
                 khoj-index-directories `(,org-directory "~/logseq")
                 )

Web client

Under the admin panel, the "Entrys" database is empty.

To Reproduce

Steps to reproduce the behavior:

Screenshots

If applicable, add screenshots to help explain your problem.

Platform

  • Server:
    • Cloud-Hosted (https://app.khoj.dev)
    • Self-Hosted Docker
    • Self-Hosted Python package
    • Self-Hosted source code
  • Client:
    • Obsidian
    • Emacs
    • Desktop app
    • Web browser
    • WhatsApp
  • OS:
    • Windows
    • macOS
    • Linux
    • Android
    • iOS

If self-hosted

  • Server Version [e.g. 1.0.1]: 1.36.3

Additional context

docker-compose.yml

services:
  database:
    image: ankane/pgvector
    environment:
      POSTGRES_DB: postgres
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: ...
    volumes:
      - khoj_db:/var/lib/postgresql/data/
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 30s
      timeout: 10s
      retries: 5
  sandbox:
    image: ghcr.io/khoj-ai/terrarium:latest
  search:
    image: docker.io/searxng/searxng:latest
    volumes:
      - khoj_search:/etc/searxng
    environment:
      - SEARXNG_BASE_URL=http://localhost:8080/
  server:
    depends_on:
      database:
        condition: service_healthy
    # Use the following line to use the latest version of khoj. Otherwise, it will build from source. Set this to ghcr.io/khoj-ai/khoj-cloud:latest if you want to use the prod image.
    image: ghcr.io/khoj-ai/khoj:latest
    ports:
      - "42110:42110"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    working_dir: /app
    volumes:
      - khoj_config:/root/.khoj/
      - khoj_models:/root/.cache/torch/sentence_transformers
      - khoj_models:/root/.cache/huggingface
    # Use 0.0.0.0 to explicitly set the host ip for the service on the container. https://pythonspeed.com/articles/docker-connection-refused/
    environment:
      - POSTGRES_DB=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=...
      - POSTGRES_HOST=database
      - POSTGRES_PORT=5432
      - KHOJ_DJANGO_SECRET_KEY=...
      - KHOJ_DEBUG=False
      - KHOJ_ADMIN_EMAIL=...
      - KHOJ_ADMIN_PASSWORD=...
      # Default URL of Terrarium, the Python sandbox used by Khoj to run code. Its container is specified above
      - KHOJ_TERRARIUM_URL=http://sandbox:8080
      # Default URL of SearxNG, the default web search engine used by Khoj. Its container is specified above
      - KHOJ_SEARXNG_URL=http://search:8080
      # Uncomment line below to use with Ollama running on your local machine at localhost:11434.
      # Change URL to use with other OpenAI API compatible providers like VLLM, LMStudio etc.
      - OPENAI_BASE_URL=http://host.docker.internal:11434/v1/
      - JINA_API_KEY=jina_...
    # Comment out this line when you're using the official ghcr.io/khoj-ai/khoj-cloud:latest prod image.
    command: --host="0.0.0.0" --port=42110 -vv --anonymous-mode --non-interactive
@ashgillman ashgillman added the fix Fix something that isn't working as expected label Feb 10, 2025
@ashgillman
Copy link
Author

I haven't been able to diagnose the issues with the Desktop App, but I found that the issues with the Emacs app were actually that my repostiory was perhaps too large. It seemed to run until killed by OOM then was restarting.
The Desktop app didn't strike any activity even choosing a small directory so it is something else. No matter, I'm happy without it for now.

I've found a solution using the khoj-sync python service (modified) from here: https://gist.github.com/dj311/fad8666c361261ed4af68285a233250a
This works, as you're able to choose an upload batch size. Maybe this should be an option in the emacs interface as well.

Anyway, I'm happy for this issue to be closed, as a manual python sync client is working for me and having this run as a service suits my workflow better anyway.

@NoirJ0e
Copy link

NoirJ0e commented Mar 15, 2025

thx for the info, i encounter the exact same issue from Obsidian and indeed solved with the khoj-sync script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Fix something that isn't working as expected
Projects
None yet
Development

No branches or pull requests

2 participants