Technology

I attempted ChatGPT’s new o1-preview mannequin, however you should not change simply but

18 September 2024

Calvin Wankhede / Android Authority

With competitors from Google’s Gemini and Anthropic’s Claude AI fashions heating up, OpenAI has discovered itself within the midst of an id disaster. As soon as the undisputed chief in massive language fashions (LLMs), it’s now scrambling to take care of its place on the high. New fashions like ChatGPT-4o and 4o mini have stemmed the exodus to competing AI chatbots, however OpenAI is underneath fixed stress to maintain innovating. The corporate has performed simply that with o1-preview, a brand new AI mannequin collection that excels at advanced reasoning and emulating human thought. How good is it? I put it to the check to seek out out.

What’s the new o1-preview ChatGPT mannequin all about?

OpenAI’s o1-preview and o1-mini are the newest fashions out there inside ChatGPT, designed for advanced reasoning duties and problem-solving. As their names recommend, these fashions will not be generational successors to GPT-4 or any of OpenAI’s earlier language fashions. Actually, GPT-4o is not going to solely live on but additionally stay the default mannequin for all chats.

Not like prior fashions that responded to your prompts as shortly as potential, the o1 collection has been designed to spend extra time pondering by way of issues, much like a human’s thought course of. This naturally ensures larger accuracy in prompts associated to math and coding, however it is usually helpful for real-world questions and eventualities, as I’ll showcase in my testing beneath.

We first heard in regards to the o1 mannequin collection in July, when Reuters interviewed researchers conversant in a secretive inside venture codenamed Strawberry. The purpose of the venture was to develop an AI able to performing “deep analysis,” in keeping with the corporate’s mission to realize synthetic basic intelligence (AGI). The latter refers to an AI system that’s clever sufficient to outthink people throughout a number of topics. The Strawberry venture was rumored to reach forward of GPT-5, which remains to be being developed.

o1 is OpenAI’s newest mannequin household that may break down issues and motive like a human.

The brand new o1 collection remains to be a great distance off from attaining true AGI — OpenAI CEO Sam Altman admitted that “o1 remains to be flawed, nonetheless restricted, and it nonetheless appears extra spectacular on first use than it does after you spend extra time with it.” Nevertheless, it’s an enormous leap ahead from the earliest ChatGPT launch that many believed would by no means succeed at fixing math issues or logical workouts.

Whereas o1-preview is the latest flagship mannequin, it’s additionally accompanied by a a lot leaner and sooner o1-mini. OpenAI discovered that the collection excels at coding, so it additionally launched a second mannequin that may precisely generate and debug code. Aimed principally at builders, o1-mini is 80% cheaper than o1-preview.

o1-preview vs GPT-4o examined: Is it actually higher?

When you’re skeptical that o1-preview is leagues forward of prior fashions, there’s excellent news — the chatbot does pause to assume, generally upwards of a minute, earlier than responding. It breaks down advanced issues into chunks, which helps it appropriate errors

Nevertheless, there’s additionally dangerous information — the o1 collection isn’t universally higher throughout the board. Specifically, it can’t search the web for brand new info just like the older GPT-4o mannequin nor can it carry out superior knowledge evaluation. You additionally can’t add recordsdata and pictures, which means you’ll need to frontload every immediate with as a lot info and context as potential. OpenAI even admits that many ChatGPT customers will wish to follow GPT-4o in the interim.

Setting apart these caveats, although, how does it carry out? To seek out out, I posed a handful of complicated and sophisticated inquiries to each of OpenAI’s finest fashions. Right here’s how o1-preview fared vs GPT-4o.

Immediate 1: What number of legs do I’ve?

Beginning with a simple one, I requested ChatGPT what number of legs I’d have if I had 4 cows, 3 canines, 2 cats. The reply is clearly two, which GPT-4o put forth however solely after saying I’d have 36 animal legs. Against this, I watched the o1-preview mannequin “assume” for 5 seconds earlier than accurately (and confidently) saying I’d have two legs. It additionally acknowledged that the query was a riddle.

I additionally posed the identical query to OpenAI’s smaller GPT-4o mini mannequin and it failed miserably. It merely mentioned I’d have 38 legs, including mine to the animals’ rely.

Immediate 2: Funding return calculation, whereas accounting for forex depreciation

Since easy prompts solely require a number of seconds of pondering, I made a decision to take issues up a notch. On this immediate, I requested ChatGPT to seek out the higher funding between two belongings with differing returns and dangers. The chatbot took 11 seconds to assume earlier than it responded this time. As soon as once more, it delivered the right reply whereas explaining every step.

Apparently, GPT-4o additionally arrived on the identical conclusion nevertheless it didn’t compute the figures by itself. As a substitute, it generated the Python code essential to carry out the calculations and executed it by way of ChatGPT’s superior knowledge evaluation function. So whereas the output is similar, the complexity is increased. Coding as a workaround additionally has the potential to fail fairly spectacularly, as I’d quickly discover out.

Immediate 3: Which is best, shopping for a home or renting?

When you grasp round financially savvy of us, you’ll know that renting vs shopping for a home is an excellent divisive subject that includes loads of variables, each monetary and in any other case. Fortunately, we will ask ChatGPT to do the mathematics for us — the o1-preview mannequin put 37 seconds’ price of thought into this query and broke it down into 12 completely different steps.

I supplied a number of figures, together with my down cost quantity, rate of interest, anticipated return on funding if I rented as an alternative, and extra. This made the query much more sophisticated — ChatGPT needed to first compute the price of an $800,000 dwelling with a $200,000 down cost. The remaining quantity can be financed with a 20-year mortgage at 3.5% curiosity. If I rented as an alternative, I’d be capable of make investments the complete $200,000 in an index fund and save any further earnings after paying off the lease too.

The o1-preview mannequin responded with a 1,000-word breakdown of the issue, concluding that my web price can be increased by roughly $716,620 after 20 years if I rented as an alternative of shopping for a house.

OpenAI’s prior GPT-4o mannequin cannot sustain with o1-preview in superior reasoning duties.

Feeding the identical immediate to GPT-4o yielded a way more disappointing final result. The mannequin tried to generate and run Python code to resolve this downside, however failed twice earlier than succeeding on the third strive. Even then, it responded incorrectly and steered I’d lower your expenses by shopping for a house as an alternative. It solely admitted fault after I identified a discrepancy in its calculations.

gpt o1 preview buy vs rent house complexity thought

Since there are much more variables that may be concerned, I additionally requested o1-preview to think about components like property appreciation, upkeep prices, and taxes if I purchased a house in addition to a possible 3% improve in lease payable yearly. This time, it took 142 seconds to assume earlier than responding with a believable conclusion, which I feel could be very spectacular.

Learn how to use ChatGPT’s o1-preview and o1-mini fashions

As you could have guessed, the o1 mannequin collection requires copious quantities of computational energy. And provided that ChatGPT itself has been rumored to be unprofitable since its launch in 2022, it’s not shocking that OpenAI has locked o1-preview behind a paywall. In different phrases, you have to a ChatGPT Plus subscription to pick the newest mannequin from the dropdown menu pictured above.

Actually, the mannequin is so costly that OpenAI has additionally positioned a tough cap of fifty messages per week on high of the $20 monthly paywall. When you exhaust this quota, your solely choice is to attend or pay for a second ChatGPT Plus account. OpenAI has imposed such price limits prior to now, particularly across the time GPT-4 was first launched, however this occasion is essentially the most aggressive one but.

Fortunately, the overwhelming majority of ChatGPT prompts don’t profit from o1’s pondering capabilities. And in case you are a programmer, the o1-mini mannequin inside ChatGPT can be rolling out to the free plan in a restricted capability.

No, you could pay for a ChatGPT Plus subscription to make use of the o1-preview mannequin. Nevertheless, the o1-mini mannequin is accessible on the free tier in a restricted capability.

All in all, ChatGPT’s new o1-preview mannequin could be very spectacular and price a glance if in case you have math and programming questions. It won’t be the only option for many duties, and even the overwhelming majority of duties, nevertheless it’s the closest we’ve to emulating human reasoning and thought. Nevertheless, the overwhelming majority of customers gained’t profit from o1-preview’s improved logical reasoning abilities or math capabilities so I can’t suggest switching to it full time. The weekly response restrict and missing net looking assist additionally imply I’ll proceed utilizing GPT-4o going ahead. And in the event you solely use ChatGPT a number of instances every single day, you possibly can simply get by with a free account.

Perplexity’s Professional Search function additionally applied multi-step reasoning a number of months in the past and it too delivered spectacular ends in my testing. If you need a peek at chain-of-thought AI reasoning with out paying for it, I’d suggest making an attempt it out because you get 5 Perplexity Professional searches each few hours on the free tier. I haven’t examined it in opposition to OpenAI’s o1-preview head-to-head but, nevertheless it’s clear that competitors within the AI area has pressured ChatGPT to evolve and I can’t wait to see the place it’s headed subsequent.