Home Blog Page 3807

Bettering RLHF (Reinforcement Studying from Human Suggestions) with Critique-Generated Reward Fashions


Language fashions have gained prominence in reinforcement studying from human suggestions (RLHF), however present reward modeling approaches face challenges in precisely capturing human preferences. Conventional reward fashions, skilled as easy classifiers, wrestle to carry out express reasoning about response high quality, limiting their effectiveness in guiding LLM habits. The first subject lies of their incapacity to generate reasoning traces, forcing all evaluations to happen implicitly inside a single ahead go. This constraint hinders the mannequin’s capability to evaluate the nuances of human preferences totally. Whereas different approaches just like the LLM-as-a-Choose framework have tried to handle this limitation, they typically underperform basic reward fashions in pairwise choice classification duties, highlighting the necessity for a simpler methodology.

Researchers have tried numerous approaches to handle the challenges in reward modeling for language fashions. Rating fashions like Bradley-Terry and Plackett-Luce have been employed, however they wrestle with intransitive preferences. Some research instantly mannequin the likelihood of 1 response being most well-liked over one other, whereas others concentrate on modeling rewards throughout a number of targets. Latest work has proposed sustaining and coaching the language mannequin head as a type of regularization.

Critique-based suggestions strategies have additionally been explored, with some using self-generated critiques to enhance era high quality or function choice indicators. Nevertheless, these approaches differ from efforts to coach higher reward fashions when human choice knowledge is offered. Some researchers have investigated utilizing oracle critiques or human-labeled critique preferences to show language fashions to critique successfully.

The LLM-as-a-Choose framework, which makes use of a grading rubric to guage responses, shares similarities with critique-based strategies however focuses on analysis slightly than revision. Whereas this method produces chain-of-thought reasoning, it usually underperforms basic reward fashions in pairwise choice classification duties.

Researchers from Databricks, MIT, and the College of California, San Diego current Critique-out-Loud (CLoud) reward fashions, which characterize a singular method to bettering language mannequin efficiency in reinforcement studying from human suggestions. These fashions generate an in depth critique of how properly an assistant’s response solutions a person’s question earlier than producing a scalar reward for the response high quality. This course of combines the strengths of basic reward fashions and the LLM-as-a-Choose framework.

CLoud reward fashions are skilled utilizing a choice dataset containing prompts, responses, and oracle critiques. The coaching course of entails supervised fine-tuning on oracle critiques for critique era and the Bradley-Terry choice mannequin for scalar reward manufacturing. To boost efficiency, the researchers discover multi-sample inference methods, significantly self-consistency, which entails sampling a number of critique-reward predictions and marginalizing throughout critiques for a extra correct reward estimate.

This progressive method goals to unify reward fashions and LLM-as-a-Choose strategies, probably resulting in important enhancements in pairwise choice classification accuracy and win charges in numerous benchmarks. The researchers additionally examine key design decisions, similar to on-policy versus off-policy coaching, and the advantages of self-consistency over critiques to optimize reward modeling efficiency.

CLoud reward fashions prolong basic reward fashions by incorporating a language modeling head alongside the bottom mannequin and reward head. The coaching course of entails supervised fine-tuning on oracle critiques, changing these with self-generated critiques, after which coaching the reward head on the self-generated critiques. This method minimizes the distribution shift between coaching and inference. The mannequin makes use of modified loss capabilities, together with a Bradley-Terry mannequin loss and a critique-supervised fine-tuning loss. To boost efficiency, CLoud fashions can make use of self-consistency throughout inference, sampling a number of critiques for a prompt-response pair and averaging their predicted rewards for a remaining estimate.

The researchers evaluated CLoud reward fashions in opposition to basic reward fashions utilizing two key metrics: pairwise choice classification accuracy and Finest-of-N (BoN) win price. For pairwise choice classification, they used the RewardBench analysis suite, which incorporates classes like Chat, Chat-Onerous, Security, and Reasoning. The BoN win price was assessed utilizing ArenaHard, an open-ended era benchmark.

CLoud reward fashions considerably outperformed basic reward fashions in pairwise choice classification throughout all classes on RewardBench, for each 8B and 70B mannequin scales. This led to a considerable enhance in common accuracy for CLoud fashions.

Within the BoN analysis on ArenaHard, CLoud fashions demonstrated a Pareto enchancment over basic fashions, producing equal or considerably larger win charges. For Finest-of-16, CLoud improved the win price by 1.84 and 0.89 share factors for 8B and 70B fashions, respectively. These outcomes recommend that CLoud reward fashions supply superior efficiency in guiding language mannequin habits in comparison with basic reward fashions.

This research introduces CLoud reward fashions, which characterize a big development in choice modeling for language fashions. By preserving language modeling capabilities alongside a scalar reward head, these fashions explicitly purpose about response high quality by means of critique era. This method demonstrates substantial enhancements over basic reward fashions in pairwise choice modeling accuracy and Finest-of-N decoding efficiency. Self-consistency decoding proved helpful for reasoning duties, significantly these with quick reasoning horizons. By unifying language era with choice modeling, CLoud reward fashions set up a brand new paradigm that opens avenues for bettering reward fashions by means of variable inference computing, laying the groundwork for extra refined and efficient choice modeling in language mannequin growth.


Take a look at the Paper. All credit score for this analysis goes to the researchers of this mission. Additionally, don’t neglect to observe us on Twitter and be part of our Telegram Channel and LinkedIn Group. If you happen to like our work, you’ll love our e-newsletter..

Don’t Neglect to hitch our 49k+ ML SubReddit

Discover Upcoming AI Webinars right here


Asjad is an intern guide at Marktechpost. He’s persuing B.Tech in mechanical engineering on the Indian Institute of Know-how, Kharagpur. Asjad is a Machine studying and deep studying fanatic who’s all the time researching the functions of machine studying in healthcare.



EV Revolution Information — BMW Outsells Tesla, Canadians Need Chinese language Electrical Automobiles, Mercedes PHEV


Join each day information updates from CleanTechnica on electronic mail. Or observe us on Google Information!


These celebrating the information that gross sales of electrical vehicles are slowing considerably may need to take a step again and see the massive image. An apt analogy is perhaps the flat display screen TV phenomenon. The primary of them had been huge, cumbersome, and OMG costly — upwards of $10,000 within the Nineties. In different phrases, they price greater than some new vehicles! And but, right now we are able to stroll into Walmart or Costco and stroll out with a 72″ sensible flat display screen TV for lower than $500. What’s the purpose? Merely this: New applied sciences might take some time to reach, however finally they sweep apart every little thing that has gone earlier than to grow to be the brand new regular. Listed here are just a few tidbits within the information this week about electrical vehicles that readers might discover attention-grabbing.

BMW Electrical Automobiles Outsell Tesla In Europe

First up, Autocar experiences that BMW offered extra electrical vehicles in Europe in July than Tesla. It delivered 14,869 new EVs in Europe final month, 308 greater than its American rival. Tesla skilled a hunch in Europe in July, with registrations of its Mannequin Y SUV falling by 16% to 9,544 and people of its Mannequin 3 saloon falling by 17% to 4,694. Be aware, although, that the primary month of the quarter is usually not the very best for Tesla, and as we’ll see in a second, its 2024 numbers are fairly excessive.

The whole variety of electrical vehicles offered within the EU fell by 6% yr on yr, which market analyst Jato Dynamics attributed to the lack of presidency incentives and patrons’ considerations about resale values.

“The dearth of readability across the incentives for EVs continues to current a barrier to customers contemplating an EV,” Felipe Munoz, world analyst at JATO Dynamics, instructed EuroNews. Germany ended its EV subsidy program in December 2023 as a part of a cost-cutting drive which prompted producers to rethink their electrification methods.

To this point, BMW has managed to emerge comparatively unscathed in comparison with its rivals, which some analysts put right down to model loyalty amongst clients. Others have advised that polarizing political remarks from Elon Musk, the CEO of Tesla, could also be harming gross sales on the American agency. Even so, Tesla remains to be main the pack when year-to-date gross sales, counting a complete of 178,700 automobiles. BMW is in second place with 97,525 gross sales, whereas Volkswagen is available in third with a complete of 88,445.

Gross sales of electrical vehicles by BMW embrace quite a lot of fashions, which mixed helped it declare the highest spot within the EU in July, however the Tesla Mannequin Y remains to be the very best promoting EV in Europe (9,544), adopted by the Volvo EX30 with 6,573 gross sales and the Volkswagen ID.4 with 5,295. Total, whole gross sales of latest vehicles had been up by 2% yr on yr in July.

Canadians Crave Low-cost Chinese language Electrical Automobiles

We love electrical vehicles right here at CleanTechnica, the place just about each member of our workers drives an EV. We applaud their low emissions and the way they’re displacing gasoline, however we are also practical sufficient to know that value is the engine that drives commerce. When flat display screen TVs price $10,000, few folks purchased them. Now that they price a fraction of that, they’re flying out the door. The identical might be true of electrical vehicles.

The Chinese language have cracked the code on learn how to manufacture reasonably priced electrical vehicles — albeit with large coverage and monetary help from the Chinese language authorities — and they’re turning their eyes towards different nations as they appear to develop their export markets. A type of nations is Canada, which is caught within the political crosscurrents created by the choice by the US to boost tariffs on Chinese language made electrical vehicles to 102.5%. Canada has a big auto manufacturing trade and lots of the vehicles it makes are offered within the US. It needs to be cautious to not irritate Uncle Sam by declining to affix the development towards tariffs on China’s electrical vehicles. The one electrical vehicles at the moment imported into Canada are Chinese language made Teslas, that are topic to a 6% import obligation.

Canada additionally has a vested curiosity in defending the almost $46 billion that has been invested by corporations like Volkswagen and Northvolt to develop manufacturing services for electrical vehicles and their parts in Canada. “We’re gaining a foothold on this rising trade, and we’re doing that in partnership with our American companions and allies. If we don’t take motion now to present these services time to return on-line to permit producers to construct up their manufacturing capability, we put up all that funding in danger,” Brian Kingston, president of the Canadian Automobile Producers’ Affiliation, mentioned.

Regardless of all that high-level political stuff, the CBC experiences that many Canadians would willingly purchase a Chinese language made EV which at the moment retails for lower than half what different vehicles in Canada price. BYD debuted its Seagull EV final yr at a beginning value of $14,600 Canadian (about USD $11,000) for a model with 305 kilometers (190 miles) of vary. The least costly typical automobile accessible prices $38,000 Canadian. BYD just isn’t promoting its vehicles in Canada but, and the value would definitely be larger there than it’s in China, however the firm employed a marketing consultant final month to advise it on the “anticipated market entry of BYD into Canada.”

A automobile with lower than 200 miles of vary could be a deal breaker for some, however saving tens of hundreds of {dollars} can offset quite a lot of these considerations. And 200 miles is nothing to sneer at. The quantity of people that drive 200 miles day by day is kind of small. The Seagull will be plugged into a standard wall outlet when not in use and be prepared for no matter driving must be performed within the morning.

Barbara MacLellan, who lives in British Columbia, instructed the CBC that an inflow of Chinese language automobiles would assist spur competitors in Canada. “The issue has been that legacy automakers haven’t stepped as much as the plate on this. It’s not early days. We’re late to this. We’re seeing the local weather change now.” Many CleanTechnica readers will agree together with her.

Not everyone seems to be thrilled, after all. Morgan Whittall in Ontario argued it doesn’t make sense to purchase an EV from a rustic with such poor environmental insurance policies. “You’re not likely getting what you’re paying for whenever you’re saying, ‘I’m doing this for local weather change,’ however you’re shopping for it from a spot that actually isn’t contemplating how their course of is affecting the local weather,” he mentioned. The CBC, to its credit score, pushed again on these claims. China, it mentioned, emits about one third of the world’s carbon dioxide, based on the Worldwide Vitality Company, however has additionally grow to be a world chief in photo voltaic and wind power, together with electrical automobiles. A current report from the Finnish think-tank Centre for Analysis on Vitality and Clear Air projected that 2024 might be the yr emissions in China begin to fall.

Environmental teams akin to Environmental Protection argue that blocking the entry of Chinese language electrical vehicles will make them dearer and delay Canada’s transition to a low carbon economic system. Final yr, the Liberals dedicated to finish the sale of latest gasoline and diesel automobiles by 2035.

Mercedes GLC 350e PHEV With 54 Miles Of Vary

Flying below the radar this week was the announcement from Mercedes that it has a brand new plug-in hybrid with 54 miles of battery-only vary — the GLC 350e. The development right now appears to be towards extra vehicles that mix an electrical motor with a gasoline engine. Logically, the concept makes EV purists cringe, but it surely eliminates the vary anxiousness and charging fears many drivers have when they consider driving an electrical automobile. 54 miles is sufficient to deal with most each day driving chores, however so long as there may be gasoline within the tank, the automobile can maintain driving lengthy after the power saved within the battery is depleted. It’s an concept that appeals to many individuals.

In keeping with Autoblog, the GLC 350e has a 24.8 kWh battery pack and is rated by the EPA as having 54 miles of vary on a full cost. The Volvo XC60 Recharge is rated at simply 35 miles of battery-only vary. The brand new Mercedes may also be charged with as much as 60 kilowatts of energy utilizing a DC quick charger — sufficient for a zero to full cost in lower than half-hour. The GLC 350e begins at $61,050, about $10,000 greater than the standard GLC 300 4Matic. It options 4Matic all-wheel drive normal, and its non-PHEV equal (GLC 300 4Matic) begins at $50,600. However for the additional cash, drivers get a automobile with 313 hp and 406 lb-ft of torque. The traditional mannequin makes do with 255 hp and 295 lb-ft. Acceleration instances are equivalent for the 2 vehicles, due to the additional weight of the battery, which has the identical capability as the unique Nissan LEAF. Mercedes says the plug-in GLC is offered at US dealerships now.

The Takeaway

What now we have here’s a combined bag — some excellent news, some not so excellent news. It’s honest to say the world of electrical vehicles is in flux, with China pushing the envelope of what’s doable whereas the remainder of the world is clinging to the previous so long as doable. “You by no means rely your cash when you’re sitting on the desk,” Kenny Rogers taught us. “There’ll be time sufficient for counting when the dealing’s performed.” The dealing isn’t performed for electrical vehicles but, not by a protracted shot. Improvements are occurring day by day and the image is getting brighter on a regular basis. Slightly slowing of the tempo doesn’t imply the EV revolution is over. Gross sales of electrical vehicles could also be down the place you reside, however globally the development is up properly.

What the political wizards really want to give attention to is what occurs after we get to peak automobile. The revolution in self-driving vehicles is simply starting and is extra of a menace to the worldwide auto trade than are electrical vehicles. The true query is, what’s going to occur to the auto trade when the period of the non-public passenger automobile passes? Meals for thought.


Have a tip for CleanTechnica? Wish to promote? Wish to counsel a visitor for our CleanTech Discuss podcast? Contact us right here.


Newest CleanTechnica.TV Movies

Commercial



 


CleanTechnica makes use of affiliate hyperlinks. See our coverage right here.

CleanTechnica’s Remark Coverage




swift – Write IOS software accepting “open with” enter paperwork


I’m attempting to jot down an IOS software which accepts PDF paperwork from different purposes.

I succeeded in writing an app which works on Xcode iPhone emulator. However once I push my software to an actual iPhone system, the iPhone proposes me my software within the checklist of purposes the place I can ship the PDF, however when choosing it, the applying opens, however doesn’t course of the doc.

To realize this, I declare my App with a DocumentGroup:

@fundamental
struct ScannerImporterApp: App {
    var physique: some Scene {
        DocumentGroup(newDocument: ScannerImporterDocument()) { file in
            ContentView(doc_name: file.doc.title, doc_content: file.doc.textual content)
        }

        WindowGroup {
            ContentView(doc_name: "", doc_content: "")
         }
    }
}

Then, I declare my class which manages the doc as is :

extension UTType {
    static var exampleText: UTType {
        UTType(importedAs: "com.adobe.pdf")
    }
}

struct ScannerImporterDocument: FileDocument 
{
    var textual content: String
    var title: String
    
    init(title: String = "My title", textual content: String = "Hi there, world!") 
    {
        print("TestDocument : init")
        self.title = title
        self.textual content = textual content
    }
    
    static var readableContentTypes: [UTType] { [.exampleText] }
    
    init(configuration: ReadConfiguration) throws {
        
        print("TestDocument : init 2")
        let information = configuration.file.regularFileContents
        
        title = configuration.file.filename!
        textual content = (information?.base64EncodedString())!
    }
    
    func fileWrapper(configuration: WriteConfiguration) throws -> FileWrapper {
        print("TestDocument : init 3")
        let information = textual content.information(utilizing: .utf8)!
        return .init(regularFileWithContents: information)
    }
}

With this code, when executed in iPhone emulator, the applying is exhibiting a file selector view when it’s opened. And if the consumer sends a PDF to this app from one other app, this pdf is managed by the app by creating a brand new ScannerImporterDocument, and offering its content material in “configuration” variable.

However when executed on actual iPhone, is reveals a file selector view when open, but it surely doesn’t react to different apps sending a pdf to it (when such motion is finished, the one result’s that the applying is proven in foreground, however nonetheless with file selector.

Any concept how I can clear up this difficulty ?

Word that I don’t must show the file selector view when opening the app. So, if there’s one other resolution with out utilizing the DocumentGroup, it may be OK for me too.

Many thanks,
Brice

Spot an imposter earlier than it’s too late

0


Scams

What are a few of the commonest giveaway indicators that the individual behind the display or on the opposite finish of the road isn’t who they declare to be?

The many faces of impersonation fraud: Spot an imposter before it’s too late

Our world is changing into extra impersonal because it turns into extra digital-centric. And since we are able to’t see the individual or group on the different finish of an e mail, social media message or textual content, it’s simpler for scammers to fake to be one thing or somebody they’re not. That is impersonation fraud, and it’s quick changing into one of many highest earners for cybercriminals. Based on the FTC, scammers impersonating companies and governments made $1.1 billion from their victims in 2023.

Impersonation fraud can take many kinds, however along with your eyes on the tell-tale indicators of a rip-off, your private info and hard-earned cash will stay beneath lock and key.

What does impersonation fraud appear like?

Like most fraud, impersonation scams are designed to get you to ship the unhealthy guys cash, or private/monetary particulars which they will both promote on the darkish net or use themselves to commit id fraud. Phishing is maybe the obvious taste of impersonation fraud: a scammer contacting you pretending to be a trusted entity requesting cash or info.

However there are different varieties. Pretend social media accounts are a rising problem; used to unfold rip-off hyperlinks and too-good-to-be-true presents. And pretend cellular apps may impersonate respectable apps to reap private data, flood your display with advertisements or enroll you in premium-rate providers.

Impersonation scams are additionally evolving. Based on the FTC, they more and more blur channels and methods, in order that “a faux Amazon worker may switch you to a faux financial institution or perhaps a faux FBI or FTC worker for faux assist.”

How one can spot the scammers

Opposite to fashionable perception, it’s not simply the aged who’re in danger from impersonation scams. Analysis within the UK discovered that folks beneath 35 usually tend to have been focused by and fallen for any such fraud. With that in thoughts, look out for these warning indicators:

  • Requests for cash: A message on e mail, textual content or DM from somebody purporting to be an in depth member of the family or contact. They’ll request pressing monetary help attributable to some excuse resembling they’re stranded abroad or wanted medical assist. Fraudsters can hijack respectable social media and e mail accounts to make it appear as if it’s actually your good friend/member of the family contacting you.
  • Distant entry: In tech assist fraud, an official from a tech firm, telco or different legitimate-seeming group requires entry to your pc for some made-up motive resembling it’s compromised with malware.
Figure 1. Example of tech support fraud
Determine 1. Instance of tech assist fraud
  • Entry to your account: A police officer or authorities official contacts you out of the blue claiming cash in your account must be analyzed as a part of an investigation into cash laundering, drug smuggling or another critical crime. They provide to ‘preserve it protected’ by shifting it elsewhere.
  • Stress: The individual on the opposite finish of the telephone, e mail, textual content or social media channel pressures you to behave instantly. They’ll attempt to panic you into making a call with out considering – resembling sending cash to a good friend at risk, or transferring urgently to a authorities official. It’s a traditional social engineering approach – generally even performed in individual or with a doubtlessly horrifying twist courtesy of AI instruments that may be co-opted to perpetrate digital kidnapping scams.
  • Pretend couriers retrieving cash: An official presents to ship a courier to return to your own home deal with to choose up money, playing cards, priceless gadgets or PINs beneath all kinds of faux pretexts, resembling serving to your relative get out of bother or to resolve a dodgy cost in your financial institution card.
  • Account safety alerts: These faux notifications typically require you to ‘verify’ your particulars first – one other method for scammers to pay money for your private and monetary info.
Figure 2. Bogus notification
Determine 2. Bogus notification
Figure 3. Made-up problems
Determine 3. Made-up issues
  • Present playing cards: You’re requested to pay fines or one-off charges by shopping for high-value gadgets or present playing cards. The scammer needs you to do that slightly than a financial institution switch, because it’s far tougher to hint.
  • Pretend subscription renewals: These could require you to attach the scammer to your pc to resume your subscription or course of a non-existent refund.
Figure 4. Bogus alert
Determine 4. Bogus alert
  • Fictitious giveaways or reductions: These require you to pay a small ‘price’ in an effort to declare them. Evidently, there isn’t a reward or low cost as that is merely a variation on the advance price fraud.
  • Uncommon language: One other tell-tale signal of phishing makes an attempt may very well be poor grammar and spelling or imprecise language in messages – though with fraudsters utilizing generative AI to create their faux messages, that is changing into much less frequent.

Keep alert

Bear in mind, impersonation scams are always evolving, so the above is actually not an exhaustive record. The following evolution in such scams is coming due to AI-powered deepfakes, which might mimic the voice and even look of a trusted particular person. These are already tricking workplace employees into making big-money company fund transfers to accounts beneath the management of cybercriminals. And the know-how is getting used to impersonate trusted people on social media in an effort to trick followers into making rash investments. As deepfakes change into cheaper and extra accessible, they may be utilized in smaller scale fraud.

With any impersonation fraud, the bottom line is: be skeptical, decelerate, and independently confirm they’re who they are saying they’re. Do that by reaching out to the group or particular person immediately, don’t reply to an e mail or telephone quantity listed on the preliminary message. And by no means hand over cash or private info except you’ve confirmed the contact is respectable.



Stream knowledge to Amazon S3 for real-time analytics utilizing the Oracle GoldenGate S3 handler

0


Trendy enterprise functions depend on well timed and correct knowledge with growing demand for real-time analytics. There’s a rising want for environment friendly and scalable knowledge storage options. Knowledge at occasions is saved in numerous datasets and must be consolidated earlier than significant and full insights may be drawn from the datasets. That is the place replication instruments assist transfer the information from its supply to the goal programs in actual time and remodel it as crucial to assist companies with consolidation.

On this put up, we offer a step-by-step information for putting in and configuring Oracle GoldenGate for streaming knowledge from relational databases to Amazon Easy Storage Service (Amazon S3) for real-time analytics utilizing the Oracle GoldenGate S3 handler.

Oracle GoldenGate for Oracle Database and Large Knowledge adapters

Oracle GoldenGate is a real-time knowledge integration and replication instrument used for catastrophe restoration, knowledge migrations, excessive availability. It captures and applies transactional modifications in actual time, minimizing latency and protecting goal programs synchronized with supply databases. It helps knowledge transformation, permitting modifications throughout replication, and works with varied database programs, together with SQL Server, MySQL, and PostgreSQL. GoldenGate helps versatile replication topologies resembling unidirectional, bidirectional, and multi-master configurations. Earlier than utilizing GoldenGate, be sure you have reviewed and cling to the license settlement.

Oracle GoldenGate for Large Knowledge offers adapters that facilitate real-time knowledge integration from totally different sources to large knowledge companies like Hadoop, Apache Kafka, and Amazon S3. You’ll be able to configure the adapters to regulate the information seize, transformation, and supply course of based mostly in your particular necessities to help each batch-oriented and real-time streaming knowledge integration patterns.

GoldenGate offers particular instruments referred to as S3 occasion handlers to combine with Amazon S3 for knowledge replication. These handlers permit GoldenGate to learn from and write knowledge to S3 buckets. This feature permits you to use Amazon S3 for GoldenGate deployments throughout on-premises, cloud, and hybrid environments.

Answer overview

The next diagram illustrates our resolution structure.

On this put up, we stroll you thru the next high-level steps:

  1. Set up GoldenGate software program on Amazon Elastic Compute Cloud (Amazon EC2).
  2. Configure GoldenGate for Oracle Database and extract knowledge from the Oracle database to path information.
  3. Replicate the information to Amazon S3 utilizing the GoldenGate for Large Knowledge S3 handler.

Conditions

You should have the next conditions in place:

Set up GoldenGate software program on Amazon EC2

It is advisable run GoldenGate on EC2 situations. The situations will need to have sufficient CPU, reminiscence, and storage to deal with the anticipated replication quantity. For extra particulars, consult with Working System Necessities. After you establish the CPU and reminiscence necessities, choose a present era EC2 occasion sort for GoldenGate.

Use the next method to estimate the required path house:

path disk house = transaction log quantity in 1 hour x variety of hours down x .4

When the EC2 occasion is up and working, obtain the next GoldenGate software program from the Oracle GoldenGate Downloads web page:

  • GoldenGate 21.3.0.0
  • GoldenGate for Large Knowledge 21c

Use the next steps to add and set up the file out of your native machine to the EC2 occasion. Be sure that your IP tackle is allowed within the inbound guidelines of the safety group of your EC2 occasion earlier than beginning a session. For this use case, we set up GoldenGate for Basic Structure and Large Knowledge. See the next code:

scp -i pem-key.pem 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip ec2-user@hostname:~/.
ssh -i pem-key.pem  ec2-user@hostname
unzip 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip

Set up GoldenGate 21.3.0.0

Full the next steps to put in GoldenGate 21.3 on an EC2 occasion:

  1. Create a house listing to put in the GoldenGate software program and run the installer:
    mkdir /u01/app/oracle/product/OGG_DB_ORACLE
    /fbo_ggs_Linux_x64_Oracle_shiphome/Disk1
    
    ls -lrt
    complete 8
    drwxr-xr-x. 4 oracle oinstall 187 Jul 29 2021 set up
    drwxr-xr-x. 12 oracle oinstall 4096 Jul 29 2021 stage
    -rwxr-xr-x. 1 oracle oinstall 918 Jul 29 2021 runInstaller
    drwxrwxr-x. 2 oracle oinstall 25 Jul 29 2021 response

  2. Run runInstaller:
    [oracle@hostname Disk1]$ ./runInstaller
    Beginning Oracle Common Installer.
    Checking Temp house: should be larger than 120 MB.   Precise 193260 MB Handed
    Checking swap house: should be larger than 150 B.       Precise 15624 MB    Handed

A GUI window will pop as much as set up the software program.

  1. Observe the directions within the GUI to finish the set up course of. Present the listing path you created as the house listing for GoldenGate.

After the GoldenGate software program set up is full, you possibly can create the GoldenGate processes that learn the information from the supply. First, you configure OGG EXTRACT.

  1. Create an extract parameter file for the supply Oracle database. The next code is the pattern file content material:
    [oracle@hostname Disk1]$vi eabc.prm
    
    -- Extract group identify
    EXTRACT EABC
    SETENV (TNS_ADMIN = "/u01/app/oracle/product/19.3.0/community/admin")
    
    -- Extract database consumer login
    
    USERID ggs_admin@mydb, PASSWORD "********"
    
    -- Native path on the distant host
    EXTTRAIL /u01/app/oracle/product/OGG_DB_ORACLE/dirdat/ea
    IGNOREREPLICATES
    GETAPPLOPS
    TRANLOGOPTIONS EXCLUDEUSER ggs_admin
    TABLE scott.emp;

  2. Add the EXTRACT on the GoldenGate immediate by working the next command:
    GGSCI> ADD EXTRACT EABC, TRANLOG, BEGIN NOW

  3. After you add the EXTRACT, examine the standing of the working packages with the information all

You will note the EXTRACT standing is within the STOPPED state, as proven within the following screenshot; that is anticipated.

  1. Begin the EXTRACT course of as proven within the following determine.

The standing modifications to RUNNING. The next are the totally different statuses:

  • STARTING – The method is beginning.
  • RUNNING – The method has began and is working usually.
  • STOPPED – The method has stopped both usually (managed method) or attributable to an error.
  • ABENDED – The method has been stopped in an uncontrolled method. An irregular finish is named ABEND.

This can begin the extract course of and a path file will likely be created within the location talked about within the extract parameter file.

  1. You’ll be able to confirm this through the use of the command stats <>, as proven within the following screenshot.

Set up GoldenGate for Large Knowledge 21c

On this step, we set up GoldenGate for Large Knowledge in the identical EC2 occasion the place we put in the GoldenGate Basic Structure.

  1. Create a listing to put in the GoldenGate for Large Knowledge software program. To repeat the .zip file, observe these steps:
    mkdir /u01/app/oracle/product/OGG_BIG_DATA
    
    unzip 214000_ggs_Linux_x64_BigData_64bit.zip
    tar -xvf ggs_Linux_x64_BigData_64bit.tar
    
    GGSCI> CREATE SUBDIRS
    GGSCI> EDIT PARAM MGR
    PORT 7801
    
    GGSCI> START MGR

This can begin the MANAGER program. Now you possibly can set up the dependencies required for the REPLICAT to run.

  1. Go to /u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader and run the sh file with the most recent model of aws-java-sdk. This script downloads the AWS SDK, which offers consumer libraries for connectivity to the AWS Cloud.
    [oracle@hostname DependencyDownloader]$ ./aws.sh 1.12.748

Configure the S3 handler

To configure an GoldenGate Replicat to ship knowledge to an S3 bucket, you’ll want to arrange a Replicat parameter file and properties file that defines how knowledge is dealt with and despatched to Amazon S3.

AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are the entry key and secret entry key of your IAM consumer, respectively. Don’t hardcode credentials or safety keys within the parameter and properties file. There are a number of strategies accessible to realize this, resembling the next:

#!/bin/bash

# Use atmosphere variables which might be already set within the OS
export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
export AWS_REGION="your_aws_region"

You’ll be able to set these atmosphere variables in your shell configuration file (e.g., .bashrc, .bash_profile, .zshrc) or use a safe methodology to set them briefly:

export AWS_ACCESS_KEY_ID="your_access_key_id"
export AWS_SECRET_ACCESS_KEY="your_secret_access_key"

Configure the properties file

Create a properties file for the S3 handler. This file defines how GoldenGate will work together together with your S3 bucket. Just be sure you have added the proper parameters as proven within the properties file.

The next code is an instance of an S3 handler properties file (dirprm/reps3.properties):

[oracle@hostname dirprm]$ cat reps3.properties
gg.handlerlist=filewriter

gg.handler.filewriter.sort=filewriter
gg.handler.filewriter.fileRollInterval=60s
gg.handler.filewriter.fileNameMappingTemplate=${tableName}${currentTimestamp}.json
gg.handler.filewriter.pathMappingTemplate=./dirout
gg.handler.filewriter.stateFileDirectory=./dirsta
gg.handler.filewriter.format=json
gg.handler.filewriter.finalizeAction=rename
gg.handler.filewriter.fileRenameMappingTemplate=${tableName}${currentTimestamp}.json
gg.handler.filewriter.eventHandler=s3

goldengate.userexit.writers=javawriter
#TODO Set S3 Occasion Handler- please replace as wanted
gg.eventhandler.s3.sort=s3
gg.eventhandler.s3.area=eu-west-1
gg.eventhandler.s3.bucketMappingTemplate=s3bucketname
gg.eventhandler.s3.pathMappingTemplate=${tableName}_${currentTimestamp}
gg.eventhandler.s3.accessKeyId=$AWS_ACCESS_KEY_ID
gg.eventhandler.s3.secretKey=$AWS_SECRET_ACCESS_KEY

gg.classpath=/u01/app/oracle/product/OGG_BIG_DATA/dirprm/:/u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader/dependencies/aws_sdk_1.12.748/
gg.log=log4j
gg.log.degree=DEBUG

#javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar -Daws.accessKeyId=my_access_key_id -Daws.secretKey=my_secret_key
javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar

Configure GoldenGate REPLICAT

Create the parameter file in /dirprm within the GoldenGate for Large Knowledge residence:

[oracle@hostname dirprm]$ vi rps3.prm
REPLICAT rps3
-- Command so as to add REPLICAT
-- add replicat fw, exttrail AdapterExamples/path/tr
SETENV(GGS_JAVAUSEREXIT_CONF = 'dirprm/rps3.props')
TARGETDB LIBFILE libggjava.so SET property=dirprm/rps3.props
REPORTCOUNT EVERY 1 MINUTES, RATE
MAP SCOTT.EMP, TARGET gg.handler.s3handler;;

[oracle@hostname OGG_BIG_DATA]$ ./ggsci
GGSCI > add replicat rps3, exttrail ./dirdat/tr/ea
Replicat added.

GGSCI > data all
Program Standing Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED RPS3 00:00:00 00:00:39

GGSCI > begin *
Sending START request to Supervisor ...
Replicat group RPS3 beginning.

Now you’ve gotten efficiently began the Replicat. You’ll be able to confirm this by working data and stats instructions adopted by the Replicat identify, as proven within the following screenshot.

To substantiate that the file has been replicated to an S3 bucket, open the Amazon S3 console and open the bucket you created. You’ll be able to see that the desk knowledge has been replicated to Amazon S3 in JSON file format.

Finest practices

Just be sure you are following the most effective practices on efficiency, compression, and safety.

Contemplate the next finest practices for efficiency:

The next are finest practices for compression:

  • Allow compression for path information to cut back storage necessities and enhance community switch efficiency.
  • Use GoldenGate’s built-in compression capabilities or use file system-level compression instruments.
  • Strike a steadiness between compression degree and CPU overhead, as a result of larger compression ranges could influence efficiency.

Lastly, when implementing Oracle GoldenGate for streaming knowledge to Amazon S3 for real-time analytics, it’s essential to deal with varied safety issues to guard your knowledge and infrastructure. Observe the safety finest practices for Amazon S3 and safety choices accessible for GoldenGate Basic Structure.

Clear up

To keep away from ongoing costs, delete the sources that you simply created as a part of this put up:

  1. Take away the S3 bucket and path information if now not wanted and cease the GoldenGate processes on Amazon EC2.
  2. Revert the modifications that you simply made within the database (resembling grants, supplemental logging, and archive log retention).
  3. To delete all the setup, cease your EC2 occasion.

Conclusion

On this put up, we supplied a step-by-step information for putting in and configuring GoldenGate for Oracle Basic Structure and Large Knowledge for streaming knowledge from relational databases to Amazon S3. With these directions, you possibly can efficiently arrange an atmosphere and reap the benefits of the real-time analytics utilizing a GoldenGate handler for Amazon S3, which we’ll discover additional in an upcoming put up.

In case you have any feedback or questions, depart them within the feedback part.


Concerning the Authors

Prasad Matkar is Database Specialist Options Architect at AWS based mostly within the EMEA area. With a give attention to relational database engines, he offers technical help to prospects migrating and modernizing their database workloads to AWS.

Arun Sankaranarayanan is a Database Specialist Answer Architect based mostly in London, UK. With a give attention to purpose-built database engines, he assists prospects in migrating and modernizing their database workloads to AWS.

Giorgio Bonzi is a Sr. Database Specialist Options Architect at AWS based mostly within the EMEA area. With a give attention to relational database engines, he offers technical help to prospects migrating and modernizing their database workloads to AWS.