Skip to content

fix: Add getting started tutorial to git #870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions .dlt/config.toml

This file was deleted.

1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ More on [use-cases](https://docs.cognee.ai/use-cases) and [evals](https://github
Get started quickly with a Google Colab <a href="https://colab.research.google.com/drive/1jHbWVypDgCLwjE71GSXhRL3YxYhCZzG1?usp=sharing">notebook</a> , <a href="https://deepnote.com/workspace/cognee-382213d0-0444-4c89-8265-13770e333c02/project/cognee-demo-78ffacb9-5832-4611-bb1a-560386068b30/notebook/Notebook-1-75b24cda566d4c24ab348f7150792601?utm_source=share-modal&utm_medium=product-shared-content&utm_campaign=notebook&utm_content=78ffacb9-5832-4611-bb1a-560386068b30">Deepnote notebook</a> or <a href="https://github.com/topoteretes/cognee-starter">starter repo</a>



## Contributing
Your contributions are at the core of making this a true open source project. Any contributions you make are **greatly appreciated**. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for more information.

Expand Down
19 changes: 19 additions & 0 deletions cognee-starter-kit/.env.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# In case you choose to use OpenAI provider, just adjust the model and api_key.
LLM_API_KEY=""
LLM_MODEL="openai/gpt-4o-mini"
LLM_PROVIDER="openai"
# Not needed if you use OpenAI
LLM_ENDPOINT=""
LLM_API_VERSION=""

# In case you choose to use OpenAI provider, just adjust the model and api_key.
EMBEDDING_API_KEY=""
EMBEDDING_MODEL="openai/text-embedding-3-large"
EMBEDDING_PROVIDER="openai"
# Not needed if you use OpenAI
EMBEDDING_ENDPOINT=""
EMBEDDING_API_VERSION=""


GRAPHISTRY_USERNAME=""
GRAPHISTRY_PASSWORD=""
196 changes: 196 additions & 0 deletions cognee-starter-kit/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
.data
.env
.local.env
.prod.env
cognee/.data/

code_pipeline_output*/

*.lance/
.DS_Store
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

full_run.ipynb

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Cognee logs directory - keep directory, ignore contents
logs/*
!logs/.gitkeep
!logs/README.md

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.env.local
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

.vscode/
cognee/data/
cognee/cache/

# Default cognee system directory, used in development
.cognee_system/
.data_storage/
.artifacts/
.anon_id

node_modules/

# Evals
SWE-bench_testsample/

# ChromaDB Data
.chromadb_data/
98 changes: 98 additions & 0 deletions cognee-starter-kit/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@

# Cognee Starter Kit
Welcome to the <a href="https://github.com/topoteretes/cognee">cognee</a> Starter Repo! This repository is designed to help you get started quickly by providing a structured dataset and pre-built data pipelines using cognee to build powerful knowledge graphs.

You can use this repo to ingest, process, and visualize data in minutes.

By following this guide, you will:

- Load structured company and employee data
- Utilize pre-built pipelines for data processing
- Perform graph-based search and query operations
- Visualize entity relationships effortlessly on a graph

# How to Use This Repo 🛠

## Install uv if you don't have it on your system
```
pip install uv
```
Comment on lines +17 to +19
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix markdown formatting and language issues.

The static analysis tools have identified several formatting and language issues that should be addressed for better documentation quality.

 ## Install uv if you don't have it on your system
-```
+```bash
 pip install uv

Install dependencies

- +bash
uv sync


## Setup LLM
Add environment variables to `.env` file.
In case you choose to use OpenAI provider, add just the model and api_key.
-```
+```bash
LLM_PROVIDER=""
LLM_MODEL=""
LLM_ENDPOINT=""
@@ -40,7 +40,7 @@ EMBEDDING_API_VERSION=""

Activate the Python environment:
- +bash
source .venv/bin/activate


@@ -48,7 +48,7 @@ source .venv/bin/activate

This script runs the cognify pipeline with default settings. It ingests text data, builds a knowledge graph, and allows you to run search queries.

-```
+```bash
python src/pipelines/default.py

@@ -56,7 +56,7 @@ python src/pipelines/default.py

This script implements its own pipeline with custom ingestion task. It processes the given JSON data about companies and employees, making it searchable via a graph.

- +bash
python src/pipelines/low_level.py


@@ -64,7 +64,7 @@ python src/pipelines/low_level.py

Custom model uses custom pydantic model for graph extraction. This script categorizes programming languages as an example and visualizes relationships.

-```
+```bash
python src/pipelines/custom-model.py

@@ -72,7 +72,7 @@ python src/pipelines/custom-model.py

cognee provides a visualize_graph function that will render the graph for you.

- +python
graph_file_path = str(
pathlib.Path(
os.path.join(pathlib.Path(file).parent, ".artifacts/graph_visualization.html")
@@ -81,10 +81,10 @@ cognee provides a visualize_graph function that will render the graph for you.
await visualize_graph(graph_file_path)

If you want to use tools like Graphistry for graph visualization:
-- create an account and API key from https://www.graphistry.com
-- add the following environment variables to `.env` file:
-```
+- create an account and API key from <https://www.graphistry.com>
+- add the following environment variables to the `.env` file:
+```bash
GRAPHISTRY_USERNAME=""
GRAPHISTRY_PASSWORD=""

-Note: GRAPHISTRY_PASSWORD is API key.
+Note: GRAPHISTRY_PASSWORD is an API key.



Also applies to: 21-23, 28-40, 43-45, 51-53, 59-61, 67-69, 75-82, 86-89

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

17-17: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In cognee-starter-kit/README.md around lines 17 to 19 and other specified
ranges, fix markdown formatting by replacing plain code blocks with
language-specific fenced code blocks (e.g., use bash or python) for better
syntax highlighting. Correct language issues such as changing "add the following
environment variables to .env file" to "add the following environment
variables to the .env file" and clarify notes like changing
"GRAPHISTRY_PASSWORD is API key" to "GRAPHISTRY_PASSWORD is an API key." Apply
these formatting and language corrections consistently throughout the mentioned
sections.


</details>

<!-- This is an auto-generated comment by CodeRabbit -->

## Install dependencies
```
uv sync
```

## Setup LLM
Add environment variables to `.env` file.
In case you choose to use OpenAI provider, add just the model and api_key.
```
LLM_PROVIDER=""
LLM_MODEL=""
LLM_ENDPOINT=""
LLM_API_KEY=""
LLM_API_VERSION=""

EMBEDDING_PROVIDER=""
EMBEDDING_MODEL=""
EMBEDDING_ENDPOINT=""
EMBEDDING_API_KEY=""
EMBEDDING_API_VERSION=""
```

Activate the Python environment:
```
source .venv/bin/activate
```

## Run the Default Pipeline

This script runs the cognify pipeline with default settings. It ingests text data, builds a knowledge graph, and allows you to run search queries.

```
python src/pipelines/default.py
```

## Run the Low-Level Pipeline

This script implements its own pipeline with custom ingestion task. It processes the given JSON data about companies and employees, making it searchable via a graph.

```
python src/pipelines/low_level.py
```

## Run the Custom Model Pipeline

Custom model uses custom pydantic model for graph extraction. This script categorizes programming languages as an example and visualizes relationships.

```
python src/pipelines/custom-model.py
```

## Graph preview

cognee provides a visualize_graph function that will render the graph for you.

```
graph_file_path = str(
pathlib.Path(
os.path.join(pathlib.Path(__file__).parent, ".artifacts/graph_visualization.html")
).resolve()
)
await visualize_graph(graph_file_path)
```
If you want to use tools like Graphistry for graph visualization:
- create an account and API key from https://www.graphistry.com
- add the following environment variables to `.env` file:
```
GRAPHISTRY_USERNAME=""
GRAPHISTRY_PASSWORD=""
```
Note: `GRAPHISTRY_PASSWORD` is API key.


# What will you build with cognee?

- Expand the dataset by adding more structured/unstructured data
- Customize the data model to fit your use case
- Use the search API to build an intelligent assistant
- Visualize knowledge graphs for better insights
11 changes: 11 additions & 0 deletions cognee-starter-kit/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[project]
name = "cognee-starter"
version = "0.1.1"
description = "Starter project which can be harvested for parts"
readme = "README.md"

requires-python = ">=3.10, <=3.13"

dependencies = [
"cognee>=0.1.38",
]
38 changes: 38 additions & 0 deletions cognee-starter-kit/src/data/companies.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
[
{
"name": "TechNova Inc.",
"departments": [
"Engineering",
"Marketing"
]
},
{
"name": "GreenFuture Solutions",
"departments": [
"Research & Development",
"Sales",
"Customer Support"
]
},
{
"name": "Skyline Financials",
"departments": [
"Accounting"
]
},
{
"name": "MediCare Plus",
"departments": [
"Healthcare",
"Administration"
]
},
{
"name": "NextGen Robotics",
"departments": [
"AI Development",
"Manufacturing",
"HR"
]
}
]
Loading
Loading