Open AI ChatGPT vs Google Bard - How do they compete?

Open AI ChatGPT vs Google Bard - How do they compete?

A deep-dive into the Combat of the AI Titans, Open AI's Chat GPT and Google Bard !

One of my hobbies is to evaluate new to market beta/preview products and write feedbacks. I have been doing this for a while now, and I have been able to get early access to some of the best products in the market. Over past just one month, I got access to about 6 different products and interestingly, most of them were in the AI Space. That tells us something about the future. To me, AI will be baked in almost all products in the coming years.

But today, I want to share what I learned about Open AI's ChatGPT and Google Bard. As always, I want to start with a WHY? We can't forget the Google's code red, chaos on Bard's announcement, and Sundar Pitchai while seeming unweathered by the disruption caused by ChatGPT, his aggressive moves on AI behind the scenes .

Google Search has been here for over 2 decades now, and it has been the go-to search engine for most of us. While google can get us the best results, we still must choose what's best for us. Google leaves us with a paradox of choice. But why all the rush with Bard? Is google trying to prove that they are still the leaders in the AI space? Or is it just a case of Google trying to catch up with the competition?

Open AI ChatGPT vs Google Bard - How do they compete?

The way I decided to evaluate these two products is by interviewing these chatbots as if I am recruiting a Data Engineer. I evaluated them on the following criteria:

  1. Understanding-oriented - Ability to understand the question and design a solution.
  2. Task-oriented - Ability to implement a given design.
  3. Learning-oriented - How can we learn using these products? How reliable are their teaching skills?
  4. Information-oriented - How accurate are they in providing information?

Understanding-oriented

I asked both ChatGPT and Bard to design a solution for a simple problem.

Question: I receive 1000's of events every second. I would like to consume them and load it to my data mart. I have kafka Databricks and Prefect. Can you design a solution for me with these tools?

ChatGPT's response was comprehensive. It understood that the question was a design and its related data engineering and that it needs to build an ETL workflow. It also understood that the tools I mentioned were the ones I had available to me. Its response included a high-level overview of the solution, the steps to implement it.

On the other hand, Bard's response was absurd. Bard did not understand that am looking for a design, but it understood that I am asking about the 3 technologies (kafka, databricks and prefect). So I believe bard, tried to stitch together those 3 things with some random code.

Bard provide 3 code snippets one for each technology ie. kafka, databricks and prefect. It did not explain how those 3 pieces of code fit together to form a solution. For example, it used readStream() to read from kafka, but it scheduled the job in prefect for every few mins. So, pretty clear that it didn't know that readStream() is a streaming job and just need to be triggered once and it will keep running.

Have you seen candidates trying to make up something on the fly while interviewing? This is what it looked like.

Task-oriented

Following the same question, I asked both ChatGPT and Bard to implement the solution.

ChatGPT wrote a templated code that actually had an example of Kafka Producer and the pyspark code in Databricks to consume it. I am not sure why it did'nt use streaming, but it did read from kafka and write to a dataframe. TBH I would'nt have suggested streaming either, knowing the cost of streaming on Databricks.

Chat GPT did a blunder in the last step. While orchestrating the workflow, ChatGPT used prefect and it re-did the kafka consumption step. It did'nt remember that it was already coded in Databricks Notebook.

Bard did write a code that was conceptually correct, interesting it used delta tables. But it wasn't close to functional, but to be fair, Bard at some point called out that its not designed to write code.

Bard couldn't write much code, but ChatGPT did pretty amazing.

Learning-oriented

Switching gears, I asked both ChatGPT and Bard to explain Kubernetes to me. I asked it in a few different ways, keeping different personas (like a programmer who is new to Kubernetes, a functional analyst who is not tech nerdy and a few more) in mind.

For a persona like a programmer, ChatGPT structured its explanation with what Kubernetes is, what it does, how it does it and why it does it. It also explained the different components of Kubernetes.It dived right into teaching me the concepts of Kubernetes.

For a persona like a functional analyst, ChatGPT gave me an analogy of a car and how Kubernetes is like a shipping container. It explained that Kubernetes is a container orchestration tool and it helps us manage containers.

Bard doesn't get into into details till we ask for it. It always gives a brief overview of the topic and unless we ask more questions, it does not give out more details. I think that is a problem because most of us don't know what we don't know. As we ask more questions, Bard does a good job of explaining the topic.

Similar to ChatGPT, Bard also gave an analogy of Kubernetes to a cooking chef. It also explained the different components of Kubernetes.

Information-oriented

I tried a few questions like

Question: Lets assume that I need to move data from a on-prem server via azure express route to a azure data lake storage that is behind a private endpoint. what is the best possible solution

and

Question: The system design involves moving data petabytes of data from on-perm data center to a Azure Data Lake Storage Gen 2, encrypting sensitive fields , transforming the data and move the data to a datamart to generate reports. what are the tools and technologies I can use in Azure to achieve this?

Both Chat GPT and Bard equally faired. Maybe these were'nt complex enough for them. One interesting thing that I noticed in common with both ChatGPT and Bard is that the information they provided were nearly identical. This was a repetitive observation. That tells me that both these tools summarize from a similar source (may be Microsoft Learn in the case of Azure). Interestingly, both ChatGPT and Bard mentioned Azure Key Vault for encryption. I specifically asked for field level encryption. Maybe both tools didn't look outside Azure for a solution.

But Chat GPT did a little better than Bard. For example, Chat GPT mentioned SHIR (self hosted integration runtime) for Azure Data Factory since its behind a Private Endpoint. Bard did not mention SHIR until I asked for it. This is typical of bard as i said above. It does not give out information unless we ask for it.

My Verdict

Bard is not a yet a great tool for understanding-oriented or task-oriented use cases. Bard cant write code yet, it does not understand design questions but it can definitely answer complex questions if we are asking for information or looking to learn. With Bard, you need to be specific about what you want to know. And you need to ask for even more questions to get the right information.

With ChatGPT, you get exactly what you want. What would have taken a developer 15-20 mins to search google and put a few pieces of code together, took ChatGPT less than a second. That is like riding a bicycle vs flying a Boeing. I like many others, am honestly impressed by what Open AI has done. ChatGPT can actually design, write code and provide recommendations.

One thing to note is, Google has more data and more resources than Open AI. On paper, Google could make Bard improve faster than ChatGPT. Only time will tell if that is true.

With all that said, ChatGPT has disrupted search. Bing was a subject of humiliation for years, but with patience and perseverance, Bing powered by ChatGPT has become a formidable competitor. Googling is now a thing of the past. Welcome to the world of AI Chatbots.

Post Script

I intentionally didn't copy past all the responses for you to read. I wanted to give you a feel of how the conversation went.

Including one conversation between me and ChatGPT & Bard

Question: I receive 1000's of events every second. I would like to consume them and load it to my data mart. I have kafka Databricks and Prefect. Can you design a solution for me with these tools?


ChatGPT: Yes, I can help you design a solution to consume events from Kafka, process them using Databricks, and load them into your data mart using Prefect. Here's a high-level overview of the solution:

Create a Kafka consumer to read messages from the Kafka topic(s) that contain the events you want to process.

Use Databricks to process the events. This could involve performing data transformations, aggregations, or any other data processing you need to do.

Once the data has been processed, write it to a file or database that will be used to load it into your data mart.

Create a Prefect workflow to orchestrate the entire process. The workflow should include the Kafka consumer, the Databricks processing step, and the data loading step.

Deploy the solution to a cloud environment, such as AWS or Azure, to ensure scalability and reliability.

Here are some more detailed steps to implement this solution:

Step 1: Create a Kafka consumer

To create a Kafka consumer, you'll need to use a Kafka client library in your preferred programming language. Some popular options include the Java Kafka client, the Kafka-Python library, and the Confluent Python client.

The consumer should be configured to read messages from the Kafka topic(s) that contain the events you want to process. You'll also need to set up any necessary authentication and encryption to ensure the security of your data.

Step 2: Use Databricks to process the events

Databricks is a cloud-based platform that provides a range of tools for processing big data. You can use Databricks to write Spark code that processes the events you receive from Kafka.

To get started with Databricks, you'll need to set up a Databricks workspace and create a cluster. You can then write Spark code in the Databricks notebook environment, which allows you to run Spark jobs on the cluster.

Your Spark code should read the events from the Kafka consumer and perform any necessary data processing. You can use Spark's built-in functions or write custom code to perform transformations, aggregations, or any other processing you need.

Step 3: Write the processed data to a file or database

Once the data has been processed, you'll need to write it to a file or database that will be used to load it into your data mart. The choice of storage depends on your specific requirements, but common options include CSV files, Parquet files, and databases such as PostgreSQL or MySQL.

You can use Databricks to write the processed data to your chosen storage format. Databricks has built-in support for writing data to many common file formats and databases.

Step 4: Create a Prefect workflow

Prefect is a Python-based workflow management system that allows you to orchestrate complex data workflows. You can use Prefect to create a workflow that coordinates the Kafka consumer, the Databricks processing step, and the data loading step.

To create a Prefect workflow, you'll need to write Python code that defines the steps in the workflow and their dependencies. You can use Prefect's built-in tasks for common operations such as reading from Kafka and writing to databases, or write custom tasks if necessary.

Step 5: Deploy the solution to a cloud environment

To ensure scalability and reliability, you should deploy your solution to a cloud environment such as AWS or Azure. This will allow you to easily scale up or down as needed and take advantage of cloud-based services such as load balancing and auto-scaling.

You can use tools such as Terraform or CloudFormation to automate the deployment of your solution to the cloud. You'll need to provision resources such as EC2 instances, VPCs, and security groups, and configure them to work together in your architecture.

And here is what Bard had to say:


Sure, I can help you design a solution to consume and load thousands of events every second from Kafka to Databricks using Prefect.

The following are the steps involved in the solution:

Create a Kafka topic. The first step is to create a Kafka topic to store the events. You can do this using the following command: kafka-topics.sh --create --topic events --partitions 3 --replication-factor 3 Create a Databricks notebook. The next step is to create a Databricks notebook to consume the events from Kafka and load them to the data mart. You can do this by opening a new notebook in Databricks and then running the following code: import pyspark.sql.functions as F from pyspark.sql.types import StructType, StructField, IntegerType, StringType

Create a schema for the events

event_schema = StructType([ StructField("event_id", IntegerType(), True), StructField("event_time", StringType(), True), StructField("event_type", StringType(), True), StructField("event_data", StringType(), True) ])

Create a DataFrame from the events

events = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "localhost:9092").option("topic", "events").load()

Cast the event data to a string

events = events.withColumn("event_data", F.cast(events["event_data"], StringType()))

Load the events to the data mart

events.writeStream.format("delta").option("checkpointLocation", "/tmp/events").start() Create a Prefect pipeline. The final step is to create a Prefect pipeline to schedule the Databricks notebook. You can do this by creating a new pipeline in Prefect and then adding the following task: task = Prefect.run( "consume_events", python_callable=lambda: spark.sql("SELECT FROM events").show(), schedule="/1 \" ) This will schedule the Databricks notebook to run every minute and consume the latest events from Kafka.

This is just a basic solution, and you may need to adjust it to fit your specific needs. For example, you may need to increase the number of partitions in the Kafka topic if you are expecting a high volume of events. You may also need to add additional steps to the Databricks notebook to clean and transform the data before loading it to the data mart.

Would you like to contact me for ideas, questions or for collaboration?

I am passionate about data and have been working with and exploring tools for over 16 years now. Data technologies amazes me, and I continue learning every day. Happy to help!