lvhan028

Lyu Han lvhan028

97 followers · 10 following

China
00:23 (UTC -12:00)

Achievements

x3 x2

Achievements

x3 x2

Stars

huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 9,234 822 Updated Jan 8, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 1,907 92 Updated Jan 4, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,332 132 Updated Jan 7, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 3,425 347 Updated Jan 6, 2025

InternLM / turbomind

C++ 50 2 Updated Dec 4, 2024

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 677 52 Updated Sep 4, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 686 29 Updated Sep 21, 2024

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 279 43 Updated Jan 8, 2025

opendatalab / labelU

Data annotation toolbox supports image, audio and video data.

Python 945 92 Updated Jan 8, 2025

opendatalab / PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Python 6,303 413 Updated Jan 3, 2025

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

Python 23,584 1,730 Updated Jan 8, 2025

microsoft / MInference

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 861 39 Updated Dec 28, 2024

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,746 520 Updated Dec 14, 2024

bentoml / OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

Python 10,351 657 Updated Jan 6, 2025

bentoml / BentoLMDeploy

Self-host LLMs with LMDeploy and BentoML

Python 17 2 Updated Dec 25, 2024

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,113 456 Updated Jan 8, 2025

InternLM / InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,697 163 Updated Dec 26, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

13,415 850 Updated Jan 6, 2025

InternLM / Agent-FLAN

[ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

334 10 Updated Mar 22, 2024

ollama / ollama

Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.

Go 106,431 8,517 Updated Jan 8, 2025

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 21,012 2,311 Updated Aug 12, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,746 170 Updated Jan 8, 2025

HqWu-HITCS / Awesome-Chinese-LLM

整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

17,341 1,661 Updated Sep 19, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 7,166 665 Updated Jan 8, 2025

InternLM / OpenAOE

LLM Group Chat Framework: chat with multiple LLMs at the same time. 大模型群聊框架：同时与多个大语言模型聊天。

TypeScript 260 23 Updated Apr 10, 2024

InternLM / HuixiangDou

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Python 1,679 134 Updated Jan 7, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 8,045 418 Updated Sep 6, 2024

state-spaces / mamba

Mamba SSM architecture

Python 13,718 1,176 Updated Jan 6, 2025

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,937 176 Updated Nov 20, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,100 1,050 Updated Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lyu Han lvhan028

Achievements

Achievements

Block or report lvhan028

Stars

huggingface / tokenizers

HazyResearch / ThunderKittens

kvcache-ai / Mooncake

gpu-mode / lectures

InternLM / turbomind

IST-DASLab / marlin

efeslab / Nanoflow

microsoft / mscclpp

opendatalab / labelU

opendatalab / PDF-Extract-Kit

opendatalab / MinerU

microsoft / MInference

pytorch-labs / gpt-fast

bentoml / OpenLLM

bentoml / BentoLMDeploy

InternLM / lmdeploy

InternLM / InternLM-XComposer

BradyFU / Awesome-Multimodal-Large-Language-Models

InternLM / Agent-FLAN

ollama / ollama

haotian-liu / LLaVA

flashinfer-ai / flashinfer

HqWu-HITCS / Awesome-Chinese-LLM

sgl-project / sglang

InternLM / OpenAOE

InternLM / HuixiangDou

SJTU-IPADS / PowerInfer

state-spaces / mamba

microsoft / DeepSpeed-MII

NVIDIA / TensorRT-LLM