Skip to content

[FIX] Unable to index local files #1105

Open
@ashgillman

Description

@ashgillman

Describe the bug

Thank you very much in advance for your help. I'm looking forward to having a client to interact with my notes once I have Khoj set up.

I'm unable to sync my files into Khoj. I've managed to get Khoj running, offline, with Ollama. However, I'm unable to sync any files. My files are predominantly .org files.

Desktop

With the Desktop Application, I've selected a number of both files and folders. It says "Connected to Server". But when I hit "force sync", I'm unable to see anything changing in the docker-compose window output.
Desktop Config:

Emacs

With the Emacs client, I can run a force update, and I will get a series of output. However, it ends like this:

Batches:   0%|          | 0/3 [00:00<?, ?it/s]
...

...
Identify new entries: 100%|██████████| 22/22 [00:00<00:00, 1919.35it/s]
server-1    | [05:41:38.247971] DEBUG    khoj.processor.content.org_mode.org_to helpers.py:195
server-1    |                            _entries: Identified entries to add to
server-1    |                            database in: 0.012 seconds
server-1    |
server-1    |
server-1    | [05:41:38.235812] DEBUG    khoj.processor.content.org_mode.org_to helpers.py:195
server-1    |                            _entries: Cleared existing dataset for
server-1    |                            regeneration in: 0.005 seconds
server-1    | ntries:   7%|▋         | 2/30 [00:00<00:03,  7.85it/s]
server-1    | ntries:   0%|          | 0/22 [00:00<?, ?it/s]
server-1    |
server-1    |
server-1    |
server-1    |
Hashing Entries: 100%|██████████| 75/75 [00:00<00:00, 455242.84it/s]
server-1    | [05:41:39.717610] DEBUG    khoj.processor.content.org_mode.org_to helpers.py:195
server-1    |                            _entries: Constructed current entry
server-1    |                            hashes in: 0.001 seconds
server-1    | [05:41:39.718678] DEBUG    khoj.processor.content.org_mod text_to_entries.py:137
server-1    |                            e.org_to_entries: Deleting all
server-1    |                            entries for file type org
server-1    | [05:41:39.726209] DEBUG    khoj.processor.content.org_mode.org_to helpers.py:195
server-1    |                            _entries: Cleared existing dataset for
server-1    |                            regeneration in: 0.008 seconds
Identify new entries: 100%|██████████| 27/27 [00:00<00:00, 1150.81it/s]onds
server-1    | [05:41:39.751274] DEBUG    khoj.processor.content.org_mode.org_to helpers.py:195
server-1    |                            _entries: Identified entries to add to
server-1    |                            database in: 0.024 seconds
server-1    |           | 0/1 [00:00<?, ?it/s]
server-1    | s:   0%|          | 0/75 [00:00<?, ?it/s]
server-1    |
server-1    |
server-1    |
server-1    |
server-1    |
server-1    |
server-1    |
server-1    |
server-1    |
server-1    | ntries:   0%|          | 0/27 [00:00<?, ?it/s]
server-1    |
server-1    |
server-1    |
server-1    | [05:41:39.780873] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API
server-1    |                            call by emacs client
server-1    | [05:41:40.413095] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API
server-1    |                            call by emacs client
server-1    | [05:41:41.093741] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API
server-1    |                            call by emacs client
server-1    | [05:41:42.037019] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API
server-1    |                            call by emacs client
server-1    | [05:41:43.448485] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API
server-1    |                            call by emacs client
server-1    | [05:41:44.063888] INFO     khoj.routers.api_content: 📬       api_content.py:556
server-1    |                            Updating content index via API

So it runs to here:

logger.info(f"📬 Updating content index via API call by {client} client")

But I never get the success log:
logger.info(f"Finished {method} {t} data sent by {client} client into content index")

Emacs config:

(setq        khoj-server-url "http://127.0.0.1:42110"
                 khoj-index-directories `(,org-directory "~/logseq")
                 )

Web client

Under the admin panel, the "Entrys" database is empty.

To Reproduce

Steps to reproduce the behavior:

Screenshots

If applicable, add screenshots to help explain your problem.

Platform

  • Server:
    • Cloud-Hosted (https://app.khoj.dev)
    • Self-Hosted Docker
    • Self-Hosted Python package
    • Self-Hosted source code
  • Client:
    • Obsidian
    • Emacs
    • Desktop app
    • Web browser
    • WhatsApp
  • OS:
    • Windows
    • macOS
    • Linux
    • Android
    • iOS

If self-hosted

  • Server Version [e.g. 1.0.1]: 1.36.3

Additional context

docker-compose.yml

services:
  database:
    image: ankane/pgvector
    environment:
      POSTGRES_DB: postgres
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: ...
    volumes:
      - khoj_db:/var/lib/postgresql/data/
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 30s
      timeout: 10s
      retries: 5
  sandbox:
    image: ghcr.io/khoj-ai/terrarium:latest
  search:
    image: docker.io/searxng/searxng:latest
    volumes:
      - khoj_search:/etc/searxng
    environment:
      - SEARXNG_BASE_URL=http://localhost:8080/
  server:
    depends_on:
      database:
        condition: service_healthy
    # Use the following line to use the latest version of khoj. Otherwise, it will build from source. Set this to ghcr.io/khoj-ai/khoj-cloud:latest if you want to use the prod image.
    image: ghcr.io/khoj-ai/khoj:latest
    ports:
      - "42110:42110"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    working_dir: /app
    volumes:
      - khoj_config:/root/.khoj/
      - khoj_models:/root/.cache/torch/sentence_transformers
      - khoj_models:/root/.cache/huggingface
    # Use 0.0.0.0 to explicitly set the host ip for the service on the container. https://pythonspeed.com/articles/docker-connection-refused/
    environment:
      - POSTGRES_DB=postgres
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=...
      - POSTGRES_HOST=database
      - POSTGRES_PORT=5432
      - KHOJ_DJANGO_SECRET_KEY=...
      - KHOJ_DEBUG=False
      - KHOJ_ADMIN_EMAIL=...
      - KHOJ_ADMIN_PASSWORD=...
      # Default URL of Terrarium, the Python sandbox used by Khoj to run code. Its container is specified above
      - KHOJ_TERRARIUM_URL=http://sandbox:8080
      # Default URL of SearxNG, the default web search engine used by Khoj. Its container is specified above
      - KHOJ_SEARXNG_URL=http://search:8080
      # Uncomment line below to use with Ollama running on your local machine at localhost:11434.
      # Change URL to use with other OpenAI API compatible providers like VLLM, LMStudio etc.
      - OPENAI_BASE_URL=http://host.docker.internal:11434/v1/
      - JINA_API_KEY=jina_...
    # Comment out this line when you're using the official ghcr.io/khoj-ai/khoj-cloud:latest prod image.
    command: --host="0.0.0.0" --port=42110 -vv --anonymous-mode --non-interactive

Metadata

Metadata

Assignees

No one assigned

    Labels

    fixFix something that isn't working as expected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions