Codey: Google’s Generative AI for Coding Tasks

Ajay Last Updated : 04 Aug, 2023

8 min read

Introduction

Since its introduction, OpenAI has released countless Generative AI and Large Language Models built on top of their top-tier GPT frameworks, including ChatGPT, their Generative Conversational AI. After the successful creation of conversational language models, developers are constantly trying to create Large Language Models that can either develop or assist developers in coding applications. Many companies have started researching these LLMs, including OpenAI, that would help developers build applications faster with the LLMs knowing programming languages. Google built Codey, a fine-tuned model of PaLM 2, capable of performing varying coding tasks.

Also Read: PaLM 2 By Google To Tackle GPT-4 Effect

Learning Objectives

Understanding how Codey was built
Learning how to work with Codey on the Google Cloud Platform
Understanding the type of prompts that Codey can take
Exploring and Engaging with different models within Codey
Leveraging Codey to generate workable Python Code
Testing Codey to see how it identifies and solves errors in code

This article was published as a part of the Data Science Blogathon.

Introduction
What is Codey?
Getting Started with Codey
Code Generation with Codey
Code Chat with Codey
Conclusion
Frequently Asked Questions

What is Codey?

Codey is one of the foundational models built and released by Google recently. The Codey is based on the PaLM 2 Large Language Model. Codey is a fine-tuned model of the PaLM 2 Large Language Model. A large corpus of high-quality codes and coding documents has fine-tuned Codey. Google claims that Codey can code in more than 20+ programming languages, including Python, C, Javascript, Java, and more. Codey was used to enhance Google products like Google Colab, Android Studio, etc.

Codey is built to solve three purposes. One is code completion. Codey can analyze your writing code and make valuable suggestions based on it. Thus it is context-aware of the code you are writing. Another is code generation. Codey can generate complete workable code in any language, provided the prompt. Finally, you can chat with your code. You can provide your code to Codey and chat with Codey related to the code. Codey is now available to the general public through Vertex AI in the Google Cloud Platform.

Also Read: Google’s Med-PaLM 2 to Be Most Advanced Medical AI

Getting Started with Codey

To work with Google’s Codey, we must have an account with the Google Cloud Platform. Google Cloud Platform hosts the service called Vertex AI, which holds all the models developed by Google and even the Open Source models fine-tuned by Google. Google has recently made available the recently announced Google Foundational models, which include PaLM 2, Codey, Chirp, and Imagen. GCP users can find them here.

After creating an account in the Google Cloud Platform, we must enable the Vertex AI API to work with Vertex AI. For this, go to the API & Services -> Library, then search for the Vertex AI API. We can see the Vertex AI API in the first pic below. Then click on it. After clicking on it, we will find a blue box with “Enable API” written on it. Click on the blue box to enable the API, which will look similar to the second pic.

How to enable Vertex AI API in Google Cloud Platform? | Coding | Generative AI | Codey

This confirmation enables us to work with any of the AI services Google provides, including Google’s foundation models like Chirp, Imagen, and Codey.

Code Generation with Codey

This section will look into Code Generation with the Codey model. The prerequisite for this will be enabling the Vertex AI API in the GCP, which we have already done. The code walkthrough here will take place in Google Colab. Before getting to the code, we must install some necessary packages to work with Vertex AI, which we will do through pip.

!pip install shapely

!pip install google-cloud-aiplatform>=1.27.0

The Shapley and the google-cloud-aiplatform are the only two required packages to start working with the Codey model. Now we will import the packages and even authenticate our Google account, so Colab can use our GCP credentials to run the Codey model from Vertex AI.

from google.colab import auth as google_auth
google_auth.authenticate_user()


import vertexai
from vertexai.preview.language_models import CodeGenerationModel


vertexai.init(project="your_project_id", location="us-west1")
parameters = {
    "temperature": 0.3,
    "max_output_tokens": 1024
}

Firstly, we import the google_auth from Google.colab package. This is necessary because this will help us authenticate by allowing the Colab to use our credentials for running the Codey model from Vertex AI.
Then we import the vertex, the package containing all the machine learning and AI-related models composed by Google. Finally, we even import the CodeGenerationModel from vertexai with which we will work.
Now we initiate the Vertex AI with the project we will work with. Here we provide the Project ID to the project variable and give any one of the locations to the location variable and the two variables as passed to the init() method of vertexai.
We even specify the parameters beforehand. These include the parameters like temperature, which is how creative our model should be, and the max_out_tokens parameter, which is the limit set to the length of the output generated by the Large Language Model.

We will take this imported model, i.e., the CodeGenerationModel, and test it by passing a prompt.

Prompt

code_model = CodeGenerationModel.from_pretrained("code-bison@001")
response = code_model.predict(
    prefix = """Write a code in Python to count the occurence of the 
    word "rocket" from a given input sentence using Regular Expressions""",
    **parameters
)

print(f"Response from Model: {response.text}")

Here is the model for code generation. We are working with a pre-trained model from Google, i.e., the “code-bison@001” model, which is the fine-tuned PaLM 2 model. This model is responsible for the generation of code given the prompt.
For passing the prompt, we pass it to the predict() function of the model. To the prefix variable, we pass the prompt. Here we want the model to generate Python code to count the occurrences of the word “rocket” using Regex.
And we even pass the previously defined parameters to the predict() function.
The responses generated by this code generation model are saved in the variable response, and to get the response, we call the text method to get the response from the model.

The output for the code can be seen below

We get a Python code as the output for the prompt we have provided. The model has written a Python script matching the query we supplied. Now the only way to test this is to copy the response, paste it into the other cell in the colab and run it. Here we see the output for the same.

The sentence we have provided when the code is run is “We have launched our first rocket. The rocket is built with 100% recycled material. We have successfully launched our rocket into space.” The output successfully states that the word “rocket” has occurred thrice. This way, Codey’s CodeGenerataionModel can be worked with to create quick working codes by just providing simple prompts to the Large Language Model.

Code Chat with Codey

The Code Chat function allows us to interact with Codey on our code. We provide the Code to Codey and chat with the Codey model about the code. It can be either to understand better the code, like how it works, or if we want alternate approaches for the given code, which Codey can do by looking at the current code. If we face any errors, then we may provide both the code and the error, which Codey will look at and give a solution to solve the error. We need to navigate to the Vertex AI in the GCP for this. In the Vertex AI service, we then navigate to the Language Section under the Generative AI Studio, which can be seen below

Navigating to the Language Section

We will go through a non-coding approach, i.e., initially, we have seen how to work with Code Generation through Python with the Vertex AI API. Now we will do this kind of task directly through the GCP itself. Now to chat with Codey on our code, we proceed with the Code Chat option in the center within the blue box. We will click on it to move, then take us to the interface below.

Here, we see that the model we will use is the “codechat-bison@001″ model. Now, what we will do is we will introduce an error to the Regular Expression code that we generated earlier. Then we will give this error code and the error caused to the Code Chat and see if the model corrects our code. In the Python Regex code, we will replace the re.findall() with re.find() and run the code. We will get the following error.

Here we see in the output that we get an error near the re.find() method. Now we will pass this modified code and the error we got to the Code Chat in the “Enter a prompt to begin a conversation.” We get the following output as soon as we hit the Enter button.

We see that the Codey model has analyzed our code and suggested where the error was. It even provided the corrected code for us to work with. This way, the Code Chat can identify and correct errors, understand the code, and even get best code practices.

Conclusion

In this article, we have looked at one of Google’s recently publicly announced foundation models, the Codey, a fine-tuned version of PaLM 2 (Google’s homegrown Generative Large Language Model). The Codey model is fine-tuned on a rich quality of code, thus allowing it to write code in more than 20 different programming languages, including Python, Java, JavaScript, etc. The Codey model is readily available through the Vertex AI, which we can access through the GCP or with the Vertex AI API through API, both of these methods we have seen in this article.

Learn More: Generative AI: Definition, Tools, Models, Benefits & More

Some of the key takeaways from this article include:

Codey is a fine-tuned model built on the PaLM 2, making it robust and reliable.
It is capable of writing code in more than 20 different programming languages.
With Codey, we can generate code from a simple prompt and even chat with the model to correct the errors that arise in the code.
Codey even provides suggestions, a Code Completion feature, where the model analyzes the code you are writing and offers valuable suggestions
We can work with Codey directly through the UI from the Generative AI Studio in the Vertex AI provided by the GCP.

Frequently Asked Questions

Q1. Is Codey capable of generating code from scratch?

A. Absolutely. You only need to provide a prompt, what code you want, and in which language. Codeys’s Code Generation then will use this prompt to generate the code in your desired language for your desired application that you have stated in the prompt

Q2. Is Codey based on the PaLM 2?

A. Yes. The Codey foundation model is just a fine-tuned model of the PaLM 2, which is fine-tuned on a vast dataset containing codes in different languages.

Q3. What Codey is capable of?

A. Codey is mainly capable of doing three things. One is code generation from a given prompt, the second is code completion, where the model looks at the code you are writing and provides useful suggestions, and the final is the code chat, where you can chat with Codey on your code, where you provide your code and error if any and then chat with the Codey model related to your code

Q4. Are Codey and GitHub Copilot the same?

A. They are not the same but are similar in some ways. GitHub Copilot is based on OpenAI’s model and is capable of auto-code-complete and code suggestions. Codey can do this as well, but it even has the feature of Code Chat, which lets the user ask the model questions related to their code

Q5. What are the models used in Codey?

A. At present, Codey contains three models. The codechat-bison@001 for the Code Chat tasks, the code-gecko@001 for the Code Completion tasks, and code-bison@001 for the Code Generation tasks.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ajay

I work as a Developer in the field of Data Science. I constantly spend time learning new things be it related to AI, DataSceine, and CyberSecurity. Deep learning and machine learning are two topics that I find particularly fascinating, and Python is my preferred language for programming. Cyber Security is another field that I'm touching upon recently. I have experience with large-scale data analysis, and I have a solid grasp of a variety of deep learning and machine learning approaches, including neural networks, regression models, and natural language processing. I'm eager to take on new challenges and make a meaningful contribution to the industry, so I'm constantly seeking for ways to enlarge and deepen my knowledge and skills in the subject.

Generative AI Python Technology

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Codey: Google’s Generative AI for Coding Tasks

Introduction

Table of contents

What is Codey?

Getting Started with Codey

Code Generation with Codey

Prompt

Code Chat with Codey

Navigating to the Language Section

Conclusion

Frequently Asked Questions

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au