Groq API Integration

Available Models

Llama 3.1 405B Instruct

The largest openly available foundation model with 405B parameters. Groq delivers near-instantaneous inference on this massive model.

• API Name: llama-3.1-405b-instruct
• 405B parameters
• 128K context window
• Fast AI inference with GroqCloud™

Llama 3.1 70B Instruct

High-performance 70B model optimized for speed and efficiency. Perfect balance of capability and cost-effectiveness.

• API Name: llama-3.1-70b-instruct
• 70B parameters
• 128K context window
• Ultra-fast inference speeds

Llama 3.1 8B Instruct

Lightweight yet powerful model for cost-effective applications. Delivers exceptional speed for everyday tasks.

• API Name: llama-3.1-8b-instruct
• 8B parameters
• 128K context window
• Most cost-effective option

Llama 3 Groq Tool Use

Specialized models for function calling and tool use applications. Available in 70B and 8B variants.

• Llama-3-Groq-70B-Tool-Use
• Llama-3-Groq-8B-Tool-Use
• Optimized for function calling
• Enhanced tool integration

Mixtral 8x7B

Mixture of Experts model from Mistral AI with excellent performance across various tasks and fast inference.

• API Name: mixtral-8x7b-32768
• Mixture of Experts architecture
• 32K context window
• Proven performance on benchmarks

Gemma 7B IT

Google's Gemma model optimized for instruction following and chat applications with Groq's speed advantage.

• API Name: gemma-7b-it
• 7B parameters
• Instruction-tuned
• Blazing fast inference

⚠️ Important Notice

Information about models, pricing, and features may be outdated or incorrect. Always consult the official provider documentation for the most current and accurate data.

Key Features

LPU™ Fast AI Inference

Groq's Language Processing Units deliver faster inference at lower cost than competitors, with near-instantaneous results.

• Up to 1200 tokens per second
• Low-latency, low-cost inference
• Consistent performance at scale
• No throttling or rate limiting

Enterprise Ready

GroqCloud™ platform with 360,000+ developers building applications. Self-serve developer tier and enterprise access available.

• OpenAI endpoint compatibility
• Batch API with discounted rates
• 24-hour turnaround for batch processing
• $640M Series D funding (2024)

Model Variety

Access to latest open-source models including Llama 3.1, Mixtral, Gemma, plus ASR models like Whisper and vision capabilities.

• LLMs and Vision models
• Automatic Speech Recognition (ASR)
• Whisper Large V3 from OpenAI
• Function calling support

Pay-Per-Use Pricing

Transparent pricing per million tokens for LLMs, per hour for ASR models, with batch processing discounts available.

• Per million input/output tokens
• Per hour for audio transcription
• Minimum charge per request for ASR
• Batch processing discounts

Technical Specifications

• Secure API key storage with iOS Keychain
• Native iOS integration with SwiftUI
• Support for streaming responses
• Automatic token counting and cost estimation
• Real-time response monitoring