Groq API Integration
Available Models
Llama 3.1 405B Instruct
The largest openly available foundation model with 405B parameters. Groq delivers near-instantaneous inference on this massive model.
- • API Name: llama-3.1-405b-instruct
- • 405B parameters
- • 128K context window
- • Fast AI inference with GroqCloud™
Llama 3.1 70B Instruct
High-performance 70B model optimized for speed and efficiency. Perfect balance of capability and cost-effectiveness.
- • API Name: llama-3.1-70b-instruct
- • 70B parameters
- • 128K context window
- • Ultra-fast inference speeds
Llama 3.1 8B Instruct
Lightweight yet powerful model for cost-effective applications. Delivers exceptional speed for everyday tasks.
- • API Name: llama-3.1-8b-instruct
- • 8B parameters
- • 128K context window
- • Most cost-effective option
Llama 3 Groq Tool Use
Specialized models for function calling and tool use applications. Available in 70B and 8B variants.
- • Llama-3-Groq-70B-Tool-Use
- • Llama-3-Groq-8B-Tool-Use
- • Optimized for function calling
- • Enhanced tool integration
Mixtral 8x7B
Mixture of Experts model from Mistral AI with excellent performance across various tasks and fast inference.
- • API Name: mixtral-8x7b-32768
- • Mixture of Experts architecture
- • 32K context window
- • Proven performance on benchmarks
Gemma 7B IT
Google's Gemma model optimized for instruction following and chat applications with Groq's speed advantage.
- • API Name: gemma-7b-it
- • 7B parameters
- • Instruction-tuned
- • Blazing fast inference
⚠️ Important Notice
Information about models, pricing, and features may be outdated or incorrect. Always consult the official provider documentation for the most current and accurate data.
Key Features
LPU™ Fast AI Inference
Groq's Language Processing Units deliver faster inference at lower cost than competitors, with near-instantaneous results.
- • Up to 1200 tokens per second
- • Low-latency, low-cost inference
- • Consistent performance at scale
- • No throttling or rate limiting
Enterprise Ready
GroqCloud™ platform with 360,000+ developers building applications. Self-serve developer tier and enterprise access available.
- • OpenAI endpoint compatibility
- • Batch API with discounted rates
- • 24-hour turnaround for batch processing
- • $640M Series D funding (2024)
Model Variety
Access to latest open-source models including Llama 3.1, Mixtral, Gemma, plus ASR models like Whisper and vision capabilities.
- • LLMs and Vision models
- • Automatic Speech Recognition (ASR)
- • Whisper Large V3 from OpenAI
- • Function calling support
Pay-Per-Use Pricing
Transparent pricing per million tokens for LLMs, per hour for ASR models, with batch processing discounts available.
- • Per million input/output tokens
- • Per hour for audio transcription
- • Minimum charge per request for ASR
- • Batch processing discounts
Technical Specifications
- • Secure API key storage with iOS Keychain
- • Native iOS integration with SwiftUI
- • Support for streaming responses
- • Automatic token counting and cost estimation
- • Real-time response monitoring