Starred repositories
LLM reads a paper and produce a working prototype
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker
Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467
Puzzles for learning Triton, play it with minimal environment configuration!
Modeling, training, eval, and inference code for OLMo
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Tiny PyTorch library for maintaining a moving average of a collection of parameters.
Implementation of the proposed minGRU in Pytorch
build ai agents that have the full context, open source, runs locally, developer friendly. 24/7 screen, mic, keyboard recording and control
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
Using FlexAttention to compute attention with different masking patterns
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Source code of Telegram for macos on Swift 5.0
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
NVIDIA Cosmos Nemotron is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
An automated pipeline for evaluating LLMs for role-playing.
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining