Gemini vs ChatGPT

Gemini vs ChatGPT: Which is better?

We all know how ChatGPT’s popularity induced an apparent “code red” at Google. Not long after the launch of ChatGPT, Google introduced its own collaborative AI tool, Bard. In their latest developments, Google released a new version of their chatbot that claims to match, if not outshine, OpenAI’s star performer.

Introduction to Gemini AI

Gemini is Google’s most powerful large language model yet that boasts multimodal abilities. The AI tool has been launched as three variants designed and built to cater to diverse usage depending on the user requirements – Gemini Nano, Gemini Pro, and the yet-to-be-released Gemini Ultra.

Gemini AI has been designed to demonstrate more power and capability than its predecessor with multimodality features that seamlessly process multimedia, i.e., text, images, video, audio, and code.

Learn more about Google Gemini AI.

Gemini vs ChatGPT comparison

Before we delve deeper into a comparative analysis and determine how Gemini and ChatGPT stack up against each other, it is important to note that Gemini Pro, the versatile, middle-tier version with its advanced text-based capabilities, has been integrated into Google Bard, enabling more accurate and high-quality responses.

The current version of Bard is comparable to ChatGPT, which is built on the GPT-3.5 model, a somewhat more limited version, and the predecessor of the GPT-4, which is used in the more advanced variant, ChatGPT Plus.

Read about how Google Bard had compared against ChatGPT.

Gemini Pro vs ChatGPT-3.5

  1. Language: To evaluate general language understanding, the MMLU (Massive Multitask Language Understanding) benchmark test assesses the ability to interpret questions across 57 diverse subjects. Gemini Pro scored 79.13% against GPT-3.5’s 70% in this test. Interestingly, it is also the first model that has outperformed human experts on the test.
  1. Arithmetic Reasoning: Gemini Pro received a very high 86.5%, beating GPT-3.5’s 57.1% in the GSM8K benchmark test that assesses arithmetic reasoning and grade school math problems.
  1. Code Generation: During the code generation test at the HumanEval benchmarks, Gemini Pro again scored higher than GPT-3.5 with 67.7% against 48.1%.

The only benchmark assessment where GPT-3.5 fared better than Gemini Pro was in the MATH category, where GPT-3.5 scored 34.1% against Gemini Pro’s slightly lower score of 32.6%.

Gemini Ultra vs ChatGPT-4

Text Processing

  1. General Capabilities: Gemini Ultra had an impressive score of 90.0% that beat the 86.4% by GPT-4 in a 5-shot setting in the MMLU.
  1. Reasoning: The Big-Bench Hard benchmark test evaluates a model’s capacity for multi-step reasoning across various challenging tasks. Gemini Ultra scored 83.6%, nearly neck and neck with GPT-4’s 83.1% in a similar 3-shot API configuration.

In DROP (Discrete Reasoning Over Paragraphs), the assessment for reading comprehension, Gemini Ultra received a score of 82.4 (variable shots), surpassing GPT-4’s 80.9 (3-shot setting).

Gemini Ultra demonstrated its strong skills in the GSM8K with 94.4%, while GPT-4 scored 92% in a 5-shot COT setting.

  1. Math: The more challenging math assessment included algebra and geometry. Gemini Ultra was slightly leading at 53.2% against GPT-4’s 52.9% in a 4-shot setting.
  1. Code Generation: In the code generation benchmark tests, HumanEval and Natural2Code, Gemini Ultra demonstrated superior ability in Python code generation, scoring 74.4% and 74.9% (zero-shot settings), while GPT-4 scored 67% and 73.9%, respectively.

 Multimedia Content Processing

  1. Image Processing (pixel only): In the MMMU (multi-discipline college-level reasoning), Gemini Ultra achieved a 59.4% 0-shot, pass@1 score, surpassing GPT-4V’s 56.8%.

In the VQAV2 for natural image understanding, Gemini Ultra scored 77.8%, slightly surpassing the 77.2% by GPT-4V (0-shot setting).

Gemini Ultra scored 82.3% in OCR on natural images in the TextVQA, while GPT-4V received a 78% score.

In the DOCVQA evaluation for document understanding, Gemini Ultra is leading with a 90.9% score; GPT-4V scored 88.4%.

  1. Video Processing: For mathematical reasoning in visual contexts, Gemini Ultra (pixel only) scored 53% in the MathVista, surpassing GPT-4V’s 49.9%.

In the VATEX benchmark for English video captioning, Gemini Ultra earned a CIDEr score of 62.7% (4-shot setting), beating GPT-4V, which earned a score of 56%.

  1. Audio Processing: In the CoVoST 2 benchmark test for automatic speech translation (21 languages), Gemini Pro achieved a BLEU score of 40.1%, significantly outperforming OpenAI’s Whisper v2’s 29.1%.

FLEURS is an assessment for automatic speech recognition (62 languages), and Gemini Pro’s word error rate was 10% less than Whisper v3 (17.6%).

To conclude this comprehensive analysis, both models perform remarkably, holding their own in various aspects. Gemini’s multimodality exhibits a notable advantage in image, video, and audio processing tasks.

The exceptional capabilities of the two AI behemoths are a reflection of the advancements in the field, setting the groundwork for future innovations. These models will continue to evolve over time, ultimately, transforming the way we interact with technology and the world around us. 

Recent blog posts:

Learn how RPA automates repetitive tasks and optimizes customer journeys for happier clients & agents.
The Role of Customer Service in eCommerce