Base64 is a binary-to-text encoding methodology that helps represent binary data in ASCII string format. It’s often used to encode data for transmission over media that are mostly text, like emails, JSON-based APIs, etc., so that binary data like images and files don’t get corrupted. The term Base64 comes from the fact that it uses 64 characters – A-Z, a-z, 0-9, +, and / to represent data. In recent years, it has been widely used in multimodal AI applications, embedded systems, cloud-based services, and web development. In this article, we’ll learn more about Base64 and how to use it.
Base64 is mostly used in cases where binary data (e.g., images, videos, model weights, etc.) needs to be passed through text-based infrastructures without being altered or corrupted. But why is it a popular choice amongst so many other types of encodings? Let’s try to understand.
Base64 is:
And here’s how other famous encodings are compared to Base64.
Encoding | Purpose | Use Case | Size Impact |
Base64 | Binary to text | Embedding images/files in HTML, JSON, etc. | ~33% increase |
Hex | Binary to Hexadecimal | Debugging, network traces | ~100% increase |
Gzip | Compression | Actual size reduction for text/binary | Compression ratio-dependent |
Also Read: What are Categorical Data Encoding Methods | Binary Encoding
Now let’s try to understand how Base64 works. Here’s a walkthrough of the step-by-step conversion of the string “Hello” into its Base64 format.
Character | ASCII Decimal Value | Binary Value (8 bits) |
H | 72 | 01001000 |
e | 101 | 01100101 |
l | 108 | 01101100 |
l | 108 | 01101100 |
o | 111 | 01101111 |
So now, our string “Hello” would look like 01001000 01100101 01101100 01101100 01101111.
That’s 5 characters × 8 bits = 40 bits.
Base64 operates on 6-bit blocks, so we group the 40 bits into chunks of 6 which was previously in chunks of 8:
01001000 01100101 01101100 01101100 01101111
When these chunks of 8 are grouped in groups of 6 they look like this:
010010 000110 010101 101100 011011 000110 1111
Since 40 isn’t directly divisible by 6, we have to pad some 0s at the end. We now have 6 full 6-bit blocks and 1 leftover 4-bit block. We pad the last block with 2 zero bits to make it a full 6-bit chunk:
010010 000110 010101 101100 011011 000110 111100
We know 2^6 is 64. So, our range will be in between 0 to 63.
6-bit binary | Decimal |
010010 | 18 |
000110 | 6 |
010101 | 21 |
101100 | 44 |
011011 | 27 |
000110 | 6 |
111100 | 60 |
Following the standard Base64 character table, we will map our decimal values to the corresponding characters.
Decimal | Base64 Character |
18 | S |
6 | G |
21 | V |
44 | s |
27 | b |
6 | G |
60 | 8 |
We get “SGVsbG8” as our Base64 encoding for our string “Hello”.
Since our original string had 5 bytes (not a multiple of 3), Base64 requires padding with “=” to make the output length a multiple of 4 characters.
5 bytes = 40 bits -> 6 full base64 chars + 2 more characters (from padded bits) -> Total 8 characters
Final Base64 encoded string: “Hello” -> SGVsbG8=
Also Read: Complete Guide on Encoding Numerical Features in Machine Learning
Now that you understand how Base64 works, let me show you how to implement it in Python. We’ll first try to encode and decode some text, and then do the same with an image.
Let’s encode this simple text using Base64 and then decode the encoded string back to its original form.
import base64
# Text encoding
message = "Hello World"
encoded = base64.b64encode(message.encode())
print("Encoded:", encoded)
# Decoding it back
decoded = base64.b64decode(encoded).decode()
print("Decoded:", decoded)
Output
In vision-related applications, especially with Vision Language Models (VLMs), images are often encoded in Base64 when:
Here’s a simple Python code to encode and decode Images.
from PIL import Image
import base64
import io
# Load and encode image
img = Image.open("example.jpeg")
buffered = io.BytesIO()
img.save(buffered, format="JPEG")
img_bytes = buffered.getvalue()
img_base64 = base64.b64encode(img_bytes).decode('utf-8')
print("Base64 String:", img_base64[:100], "...") # Truncated
Output
We can also decode our base 64 encoded data back to the image using the below code.
from PIL import Image
import base64
import io
from IPython.display import display, Image as IPythonImage
# Assume `img_base64` is the base64 string
img_data = base64.b64decode(img_base64)
img = Image.open(io.BytesIO(img_data))
display(IPythonImage(data=img_data))
Output
To learn more about Base64 and find many more encoders and decoders, you can refer this site.
Although Base64 is of great use in various use cases across domains, here are a few things to note while working with it.
In an era where models can “see” as well as “read”, Base64 has quietly become a cornerstone of multimodal systems. It plays a very important role in data encoding by bridging the gap between binary data and text‑only systems. In vision‑language workflows, it standardizes how images travel from mobile clients to cloud GPUs, while preserving reproducibility and easing integration.
Making images compatible with text-based infrastructure has always been a complex problem to solve. Base64 encoding provides a practical solution to this, enabling image transmission over APIs and packaging datasets for training.