Golang AI applications have incredible potential. With unique features like inexplicable speed, easy debugging, concurrency, and excellent libraries for ML, deep learning, and reinforcement learning.
- ADeLe: ADeLe v1.0 is a comprehensive AI evaluation framework that combines explanatory analysis and predictive modeling capabilities to systematically assess AI system performance across multiple dimensions.
- SWELancer: The SWE-Lancer-Benchmark is designed to evaluate the capabilities of frontier LLMs in solving real-world freelance software engineering tasks, exploring their potential to generate economic value through complex software development scenarios.
- ARC-AGI: The Abstraction and Reasoning Corpus.
- ARC-Challenge: AI2 Reasoning Challenge (ARC) Set.
- BBH: Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.
- BIG-bench: Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models.
- GPQA: GPQA: A Graduate-Level Google-Proof Q&A Benchmark.
- HelloSwag: HellaSwag: Can a Machine Really Finish Your Sentence?
- IFEval: IFEval is designed to systematically evaluate the instruction-following capabilities of large language models by incorporating 25 verifiable instruction types (e.g., format constraints, keyword inclusion) and applying dual strict-loose metrics for automated, objective assessment of model compliance.
- LiveBench: A Challenging, Contamination-Free LLM Benchmark.
- MMLU: Measuring Massive Multitask Language Understanding ICLR 2021.
- MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark.
- MMLU-Pro: [NeurIPS 2024] A More Robust and Challenging Multi-Task Language Understanding Benchmark.
- MTEB: Massive Text Embedding Benchmark.
- PIQA: PIQA is a dataset for commonsense reasoning, and was created to investigate the physical knowledge of existing models in NLP.
- WinoGrande: An Adversarial Winograd Schema Challenge at Scale.
- C-Eval: [NeurIPS 2023] A Chinese evaluation suite for foundation models.
- CMMLU: Measuring massive multitask language understanding in Chinese.
- C-SimpleQA: A Chinese Factuality Evaluation for Large Language Models.
- AIME: Evaluation of LLMs on latest math competitions.
- grade-school-math: The GSM8K dataset contains 8.5K grade school math word problems designed to evaluate multi-step reasoning capabilities in language models, revealing that even large transformers struggle with these conceptually simple yet procedurally complex tasks.
- MATH: The MATH Dataset for NeurIPS 2021, is a benchmark for evaluating mathematical problem-solving capabilities, offering dataset loaders, evaluation code, and pre-training data.
- MathVista: MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts.
- Omni-MATH: Omni-MATH is a comprehensive and challenging benchmark specifically designed to assess LLMs' mathematical reasoning at the Olympiad level.
- TAU-bench: TauBench is an open-source benchmark suite designed to evaluate the performance of large language models (LLMs) on complex reasoning tasks across multiple domains.
- AIDER: The leaderboards page of aider presents a performance comparison of various LLMs in programming-related tasks, such as code writing and editing.
- BFCL: BFCL aims to provide a thorough study of the function-calling capability of different LLMs.
- BigCodeBench: [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI.
- Code4Bench: A Mutildimensional Benchmark of Codeforces Data for Different Program Analysis Techniques.
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation.
- HumanEval: Code for the paper "Evaluating Large Language Models Trained on Code".
- LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code.
- MBPP: The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on.
- MultiPL-E: A multi-programming language benchmark for LLMs.
- multi-swe-bench: The Multi-SWE-bench project, developed by ByteDance's Doubao team, is the first open-source multilingual dataset for evaluating and enhancing large language models' ability to automatically debug code, covering 7 major programming languages (e.g., Java, C++, JavaScript) with real-world GitHub issues to benchmark "full-stack engineering" capabilities.
- SWE-bench: SWE-bench is a benchmark suite designed to evaluate the capabilities of large language models (LLMs) in solving real-world software engineering tasks, focusing on actual software bug-fixing challenges extracted from open-source projects.
- BFCL: Training and Evaluating LLMs for Function Calls (Tool Calls).
- T-Eval: [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step.
- WildBench: Benchmarking LLMs with Challenging Tasks from Real Users.
- Arena-Hard: Arena-Hard-Auto: An automatic LLM benchmark.
- Xstest: Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models".
- DPG-Bench: The DPG benchmark tests a model’s ability to follow complex image generation prompts.
- geneval: GenEval: An object-focused framework for evaluating text-to-image alignment.
- LongVideoBench: [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
- MLVU: Multi-task Long Video Understanding Benchmark.
- perception_test: A Diagnostic Benchmark for Multimodal Video Models is a multimodal benchmark designed to comprehensively evaluate the perception and reasoning skills of multimodal video models.
- TempCompass: A benchmark to evaluate the temporal perception ability of Video LLMs.
- VBench: VBench is an open-source project aiming to build a comprehensive evaluation benchmark for video generation models.
- Video-MME: [CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
- mcp-go: A Go implementation of the Model Context Protocol (MCP), enabling seamless integration between LLM applications and external data sources and tools.
- mcp-golang: Write Model Context Protocol servers in few lines of go code.
- gateway: Universal MCP-Server for your Databases optimized for LLMs and AI-Agents.
- gpt-go: Tiny GPT implemented from scratch in pure Go. Trained on Jules Verne books.
- feishu-openai: Feishu (Lark) integrated with (GPT-4 + GPT-4V + DALL·E-3 + Whisper) delivers an extraordinary work experience.
- chatgpt-telegram: Run your own GPTChat Telegram bot, with a single command.
- openai-go: The official Go library for the OpenAI API.
- go-openai: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go.
- generative-ai-go: Go SDK for Google Generative AI.
- anthropic-sdk-go: Access to Anthropic's safety-first language model APIs via Go.
- go-anthropic: Anthropic Claude API wrapper for Go.
- deepseek-go: A Deepseek client written for Go supporting R-1, Chat V3, and Coder. Also supports external providers like Azure, OpenRouter and Local Ollama.
- ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
- go-attention: A full attention mechanism and transformer in pure go.
- langchaingo: LangChain for Go, the easiest way to write LLM-based programs in Go.
- gpt4all-bindings: GPT4All Language Bindings provide cross-language interfaces to easily integrate and interact with GPT4All's local LLMs, simplifying model loading and inference for developers.
- go-openai: OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go.
- llama.go: llama.go is like llama.cpp in pure Golang.
- eino: The ultimate LLM/AI application development framework in Golang.
- fabric: fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
- genkit: An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to develop, integrate, and test AI features with observability and evaluations. Genkit works with various models and platforms.
- swarmgo: SwarmGo (agents-sdk-go) is a Go package that allows you to create AI agents capable of interacting, coordinating, and executing tasks.
- orra: The orra-dev/orra project offers resilience for AI agent workflows.
- core: A fast, agnostic, and powerful Go AI framework for one-shot workflows, building autonomous agents, and working with LLM providers.
- gollm: Unified Go interface for Language Model (LLM) providers. Simplifies LLM integration with flexible prompt management and common task functions.
- milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search.
- weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
- tidb: TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
- pachyderm: Data-Centric Pipelines and Data Versioning.
- MTEB: MTEB (Massive Text Embedding Benchmark) is an open-source benchmarking framework for evaluating and comparing text embedding models across 8 tasks (e.g., classification, retrieval, clustering) using 58 datasets in 112 languages, providing standardized performance metrics for model selection.
- BRIGHT: BBRIGHT is a realistic, challenging benchmark for reasoning-intensive retrieval, featuring 12 diverse datasets (math, code, biology, etc.) to evaluate retrieval models across complex, context-rich queries requiring logical inference.
- goml:On-line Machine Learning in Go (and so much more).
- golearn: simple and customizable batteries included ML library in Go.
- gonum:Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more.
- gorgonia: Gorgonia is a library that helps facilitate machine learning in Go.
- spago: Self-contained Machine Learning and Natural Language Processing library in Go.
- goro: A High-level Machine Learning Library for Go.
- goga: Golang Genetic Algorithm.
- hep: hep is the mono repository holding all of go-hep.org/x/hep packages and tools.
- hector: Golang machine learning lib.
- sklearn: bits of sklearn ported to Go.
- tokenizer: NLP tokenizers written in Go language.
- gobrain: Neural Networks written in go.
- go-neural: Neural network implementation on golang.
- go-deep: Artificial Neural Network.
- olivia: Your new best friend powered by an artificial neural network.
- gomid: A simplistic Neural Network Library in Go.
- neurgo: Neural Network toolkit in Go.
- gonn: GoNN is an implementation of Neural Network in Go Language, which includes BPNN, RBF, PCN.
- gosom: Self-organizing maps in Go.
- go-perceptron-go: A single / multi layer / recurrent neural network written in Golang.
- gosl: Linear algebra, eigenvalues, FFT, Bessel, elliptic, orthogonal polys, geometry, NURBS, numerical quadrature, 3D transfinite interpolation, random numbers, Mersenne twister, probability distributions, optimisation, differential equations.
- sparse: Sparse matrix formats for linear algebra supporting scientific and machine learning applications.
- godist: Probability distributions and associated methods in Go.
- CloudForest: CloudForest is a fast, flexible Go library for multi-threaded decision tree ensembles (Random Forest, Gradient Boosting, etc.) designed for high-dimensional heterogeneous data with missing values, emphasizing speed and robustness for real-world machine learning tasks.
- regression: Multivariable regression library in Go.
- ridge: Ridge regression in Go.
- bayesian: Naive Bayesian Classification for Golang.
- multibayes: Multiclass Naive Bayesian Classification.
- regommend: Recommendation engine for Go.
- gorse: Go Recommender System Engine.
- too: Simple recommendation engine implementation built on top of Redis.
- eaopt: Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution).
- evo: Evolutionary Algorithms in Go.
- gogl: A graph library in Go.
- gokmeans: K-means algorithm implemented in Go (golang).
- kmeans: k-means clustering algorithm implementation written in Go.
- morgoth: Metric anomaly detection.
- anomalyzer: Probabilistic anomaly detection for time series data.
- goanomaly: Golang library for anomaly detection. Uses the Gaussian distribution and the probability density formula.
- gota: Gota: DataFrames and data wrangling in Go.
- dataframe-go: DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration.
- qframe: Immutable data frame for Go.
- lime: Lime: Explaining the predictions of any machine learning classifier.
- Machine Learning With go
- Machine-Learning-With-Go: example code.
- 机器学习:Go语言实现
- GO语言机器学习实战