Groq API Integration

Available Models

Llama 3.1 405B Instruct

The largest openly available foundation model with 405B parameters. Groq delivers near-instantaneous inference on this massive model.

  • • API Name: llama-3.1-405b-instruct
  • • 405B parameters
  • • 128K context window
  • • Fast AI inference with GroqCloud™

Llama 3.1 70B Instruct

High-performance 70B model optimized for speed and efficiency. Perfect balance of capability and cost-effectiveness.

  • • API Name: llama-3.1-70b-instruct
  • • 70B parameters
  • • 128K context window
  • • Ultra-fast inference speeds

Llama 3.1 8B Instruct

Lightweight yet powerful model for cost-effective applications. Delivers exceptional speed for everyday tasks.

  • • API Name: llama-3.1-8b-instruct
  • • 8B parameters
  • • 128K context window
  • • Most cost-effective option

Llama 3 Groq Tool Use

Specialized models for function calling and tool use applications. Available in 70B and 8B variants.

  • • Llama-3-Groq-70B-Tool-Use
  • • Llama-3-Groq-8B-Tool-Use
  • • Optimized for function calling
  • • Enhanced tool integration

Mixtral 8x7B

Mixture of Experts model from Mistral AI with excellent performance across various tasks and fast inference.

  • • API Name: mixtral-8x7b-32768
  • • Mixture of Experts architecture
  • • 32K context window
  • • Proven performance on benchmarks

Gemma 7B IT

Google's Gemma model optimized for instruction following and chat applications with Groq's speed advantage.

  • • API Name: gemma-7b-it
  • • 7B parameters
  • • Instruction-tuned
  • • Blazing fast inference

⚠️ Important Notice

Information about models, pricing, and features may be outdated or incorrect. Always consult the official provider documentation for the most current and accurate data.

Key Features

LPU™ Fast AI Inference

Groq's Language Processing Units deliver faster inference at lower cost than competitors, with near-instantaneous results.

  • • Up to 1200 tokens per second
  • • Low-latency, low-cost inference
  • • Consistent performance at scale
  • • No throttling or rate limiting

Enterprise Ready

GroqCloud™ platform with 360,000+ developers building applications. Self-serve developer tier and enterprise access available.

  • • OpenAI endpoint compatibility
  • • Batch API with discounted rates
  • • 24-hour turnaround for batch processing
  • • $640M Series D funding (2024)

Model Variety

Access to latest open-source models including Llama 3.1, Mixtral, Gemma, plus ASR models like Whisper and vision capabilities.

  • • LLMs and Vision models
  • • Automatic Speech Recognition (ASR)
  • • Whisper Large V3 from OpenAI
  • • Function calling support

Pay-Per-Use Pricing

Transparent pricing per million tokens for LLMs, per hour for ASR models, with batch processing discounts available.

  • • Per million input/output tokens
  • • Per hour for audio transcription
  • • Minimum charge per request for ASR
  • • Batch processing discounts

Technical Specifications

  • • Secure API key storage with iOS Keychain
  • • Native iOS integration with SwiftUI
  • • Support for streaming responses
  • • Automatic token counting and cost estimation
  • • Real-time response monitoring