Jamba 1.5 is an instruction-tuned large language model that comes in two versions: Jamba 1.5 Large with 94 billion active parameters and Jamba 1.5 Mini with 12 billion active parameters. It combines the Mamba Structured State Space Model (SSM) with the traditional Transformer architecture. This model, developed by AI21 Labs, can process a 256K effective context window, which is the largest among open-source models.
The Jamba 1.5 models, including Mini and Large variants, are designed to handle various natural language processing (NLP) tasks such as question answering, summarization, text generation, and classification. Jamba models on an extensive corpus support nine languages—English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew. Jamba 1.5, with its joint SSM-Transformer structure, tackles the problems with the conventional transformer models that are often hindered by two major limitations: high memory requirements for long context windows and slower processing.
Aspect | Details |
Base Architecture | Hybrid Transformer-Mamba architecture with a Mixture-of-Experts (MoE) module |
Model Variants | Jamba-1.5-Large (94B active parameters, 398B total) and Jamba-1.5-Mini (12B active parameters, 52B total) |
Layer Composition | 9 blocks, each with 8 layers; 1:7 ratio of Transformer attention layers to Mamba layers |
Mixture of Experts (MoE) | 16 experts, selecting the top 2 per token for dynamic specialization |
Hidden Dimensions | 8192 hidden state size |
Attention Heads | 64 query heads, 8 key-value heads |
Context Length | Supports up to 256K tokens, optimized for memory with significantly reduced KV cache memory |
Quantization Technique | ExpertsInt8 for MoE and MLP layers, allowing efficient use of INT8 while maintaining high throughput |
Activation Function | Integration of Transformer and Mamba activations, with an auxiliary loss to stabilize activation magnitudes |
Efficiency | Designed for high throughput and low latency, optimized to run on 8x80GB GPUs with 256K context support |
Jamba 1.5 was designed for a range of applications accessible via AI21’s Studio API, Hugging Face or cloud partners, making it deployable in various environments. For tasks such as sentiment analysis, summarization, paraphrasing, and more. It can also be finetuned on domain-specific data for better results; the model can be downloaded from Hugging Face.
One way to access them is by using AI21’s Chat interface:
Here’s the link: Chat Interface
This is just a small sample of the model’s question-answering capabilities.
You can send requests and get responses from Jamba 1.5 in Python using the API Key.
To get your API key, click on settings on the left bar of the homepage, then click on the API key.
Note: You’ll get $10 free credits, and you can track the credits you use by clicking on ‘Usage’ in the settings.
!pip install ai21
from ai21 import AI21Client
from ai21.models.chat import ChatMessage
messages = [ChatMessage(content="What's a tokenizer in 2-3 lines?", role="user")]
client = AI21Client(api_key='')
response = client.chat.completions.create(
messages=messages,
model="jamba-1.5-mini",
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
A tokenizer is a tool that breaks down text into smaller units called tokens, words, subwords, or characters. It is essential for natural language processing tasks, as it prepares text for analysis by models.
It’s straightforward: We send the message to our desired model and get the response using our API key.
Note: You can also choose to use the jamba-1.5-large model instead of Jamba-1.5-mini
Jamba 1.5 blends the strengths of the Mamba and Transformer architectures. With its scalable design, high throughput, and extensive context handling, it is well-suited for diverse applications ranging from summarization to sentiment analysis. By offering accessible integration options and optimized efficiency, it enables users to work effectively with its modelling capabilities across various environments. It can also be finetuned on domain-specific data for better results.
Ans. Jamba 1.5 is a family of large language models designed with a hybrid architecture combining Transformer and Mamba elements. It includes two versions, Jamba-1.5-Large (94B active parameters) and Jamba-1.5-Mini (12B active parameters), optimized for instruction-following and conversational tasks.
Ans. Jamba 1.5 models support an effective context length of 256K tokens, made possible by its hybrid architecture and an innovative quantization technique, ExpertsInt8. This efficiency allows the models to manage long-context data with reduced memory usage.
Ans. ExpertsInt8 is a custom quantization method that compresses model weights in the MoE and MLP layers to INT8 format. This technique reduces memory usage while maintaining model quality and is compatible with A100 GPUs, enhancing serving efficiency.
Ans. Yes, both Large and Mini are publicly available under the Jamba Open Model License. The models can be accessed on Hugging Face.