link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

The AI Acceleration Cloud

AI pioneers train, fine-tune, and run frontier models on our GPU cloud platform.

200+ generative AI models

Build with open-source and specialized multimodal models for chat, images, code, and more. Migrate from closed models with OpenAI-compatible APIs.

End-to-end platform for the full generative AI lifecycle

Leverage pre-trained models, fine-tune them for your needs, or build custom models from scratch. Whatever your generative AI needs, Together AI offers a seamless continuum of AI compute solutions to support your entire journey.

Inference

Train
Models

Speed , cost , and accuracy . Pick all three.

SPEED RELATIVE TO VLLM

4 x FASTER

LLAMA-3 8B AT FULL PRECISION

400 TOKENS/SEC

COST RELATIVE TO GPT-4o

11 x lower cost

Why Together Inference

accelerated by cutting edge research

Transformer-optimized kernels: our researchers' custom FP8 inference kernels , 75%+ faster than base PyTorch

‍

Quality-preserving quantization: accelerating inference while maintaining accuracy with advances such as QTIP

‍

Speculative decoding: faster throughput, powered by novel algorithms and draft models trained on RedPajama dataset
Flexibility to choose a model that fits your needs

Turbo: Best performance without losing accuracy

‍

Reference: Full precision, available for 100% accuracy

‍

Lite: Optimized for fast performance at the lowest cost
Available via Dedicated instances and serverless API

Dedicated instances : fast, consistent performance, without rate limits, on your own single-tenant NVIDIA GPUs

‍

Serverless API : quickly switch from closed LLMs to models like Llama, using our OpenAI compatible APIs

Control your IP .
‍ Own your AI .

Fine-tune open-source models like Llama on your data and run them on Together Cloud or in a hyperscaler VPC. With no vendor lock-in, your AI remains fully under your control.

together files upload acme_corp_customer_support.jsonl
  "filename" : "acme_corp_customer_support.json",
  "id": "file-aab9997e-bca8-4b7e-a720-e820e682a10a",
  "object": "file"
together finetune create --training-file file-aab9997-bca8-4b7e-a720-e820e682a10a
--model together compute/RedPajama-INCITE-7B-Chat
together finetune create --training-file $FILE_ID 
--model $MODEL_NAME 
--wandb-api-key $WANDB_API_KEY 
--n-epochs 10 
--n-checkpoints 5 
--batch-size 8 
--learning-rate 0.0003
    "training_file": "file-aab9997-bca8-4b7e-a720-e820e682a10a",
    "model_output_name": "username/togethercomputer/llama-2-13b-chat",
    "model_output_path": "s3://together/finetune/63e2b89da6382c4d75d5ef22/username/togethercomputer/llama-2-13b-chat",
    "Suffix": "Llama-2-13b 1",
    "model": "togethercomputer/llama-2-13b-chat",
    "n_epochs": 4,
    "batch_size": 128,
    "learning_rate": 1e-06,
    "checkpoint_steps": 2,
    "created_at": 1687982945,
    "updated_at": 1687982945,
    "status": "pending",
    "id": "ft-5bf8990b-841d-4d63-a8a3-5248d73e045f",
    "epochs_completed": 3,
    "events": [
            "object": "fine-tune-event",
            "created_at": 1687982945,
            "message": "Fine tune request created",
            "type": "JOB_PENDING",
    "queue_depth": 0,
    "wandb_project_name": "Llama-2-13b Fine-tuned 1"


           
            
             
              
               Fine-tuning API

`Forge the AI frontier. Train on expert-built GPU clusters.`

Built by AI researchers for AI innovators, Together GPU Clusters are powered by NVIDIA GB200, H200, and H100 GPUs, along with the Together Kernel Collection — delivering up to 24% faster training operations.

Robust Management Tools

Slurm and Kubernetes orchestrate dynamic AI workloads, optimizing training and inference seamlessly.


         
          
           Together GPU Clusters

`Training-ready clusters – Blackwell and Hopper`


           
            
             Reserve your cluster today