Big Data

Stream knowledge to Amazon S3 for real-time analytics utilizing the Oracle GoldenGate S3 handler

25 August 2024

Trendy enterprise functions depend on well timed and correct knowledge with growing demand for real-time analytics. There’s a rising want for environment friendly and scalable knowledge storage options. Knowledge at occasions is saved in numerous datasets and must be consolidated earlier than significant and full insights may be drawn from the datasets. That is the place replication instruments assist transfer the information from its supply to the goal programs in actual time and remodel it as crucial to assist companies with consolidation.

On this put up, we offer a step-by-step information for putting in and configuring Oracle GoldenGate for streaming knowledge from relational databases to Amazon Easy Storage Service (Amazon S3) for real-time analytics utilizing the Oracle GoldenGate S3 handler.

Oracle GoldenGate for Oracle Database and Large Knowledge adapters

Oracle GoldenGate is a real-time knowledge integration and replication instrument used for catastrophe restoration, knowledge migrations, excessive availability. It captures and applies transactional modifications in actual time, minimizing latency and protecting goal programs synchronized with supply databases. It helps knowledge transformation, permitting modifications throughout replication, and works with varied database programs, together with SQL Server, MySQL, and PostgreSQL. GoldenGate helps versatile replication topologies resembling unidirectional, bidirectional, and multi-master configurations. Earlier than utilizing GoldenGate, be sure you have reviewed and cling to the license settlement.

Oracle GoldenGate for Large Knowledge offers adapters that facilitate real-time knowledge integration from totally different sources to large knowledge companies like Hadoop, Apache Kafka, and Amazon S3. You’ll be able to configure the adapters to regulate the information seize, transformation, and supply course of based mostly in your particular necessities to help each batch-oriented and real-time streaming knowledge integration patterns.

GoldenGate offers particular instruments referred to as S3 occasion handlers to combine with Amazon S3 for knowledge replication. These handlers permit GoldenGate to learn from and write knowledge to S3 buckets. This feature permits you to use Amazon S3 for GoldenGate deployments throughout on-premises, cloud, and hybrid environments.

Answer overview

The next diagram illustrates our resolution structure.

On this put up, we stroll you thru the next high-level steps:

Set up GoldenGate software program on Amazon Elastic Compute Cloud (Amazon EC2).
Configure GoldenGate for Oracle Database and extract knowledge from the Oracle database to path information.
Replicate the information to Amazon S3 utilizing the GoldenGate for Large Knowledge S3 handler.

Conditions

You should have the next conditions in place:

Set up GoldenGate software program on Amazon EC2

It is advisable run GoldenGate on EC2 situations. The situations will need to have sufficient CPU, reminiscence, and storage to deal with the anticipated replication quantity. For extra particulars, consult with Working System Necessities. After you establish the CPU and reminiscence necessities, choose a present era EC2 occasion sort for GoldenGate.

Use the next method to estimate the required path house:

path disk house = transaction log quantity in 1 hour x variety of hours down x .4

When the EC2 occasion is up and working, obtain the next GoldenGate software program from the Oracle GoldenGate Downloads web page:

GoldenGate 21.3.0.0
GoldenGate for Large Knowledge 21c

Use the next steps to add and set up the file out of your native machine to the EC2 occasion. Be sure that your IP tackle is allowed within the inbound guidelines of the safety group of your EC2 occasion earlier than beginning a session. For this use case, we set up GoldenGate for Basic Structure and Large Knowledge. See the next code:

scp -i pem-key.pem 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip ec2-user@hostname:~/.
ssh -i pem-key.pem  ec2-user@hostname
unzip 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip

Set up GoldenGate 21.3.0.0

Full the next steps to put in GoldenGate 21.3 on an EC2 occasion:

Create a house listing to put in the GoldenGate software program and run the installer:

mkdir /u01/app/oracle/product/OGG_DB_ORACLE
/fbo_ggs_Linux_x64_Oracle_shiphome/Disk1

ls -lrt
complete 8
drwxr-xr-x. 4 oracle oinstall 187 Jul 29 2021 set up
drwxr-xr-x. 12 oracle oinstall 4096 Jul 29 2021 stage
-rwxr-xr-x. 1 oracle oinstall 918 Jul 29 2021 runInstaller
drwxrwxr-x. 2 oracle oinstall 25 Jul 29 2021 response

Run runInstaller:

[oracle@hostname Disk1]$ ./runInstaller
Beginning Oracle Common Installer.
Checking Temp house: should be larger than 120 MB.   Precise 193260 MB Handed
Checking swap house: should be larger than 150 B.       Precise 15624 MB    Handed

A GUI window will pop as much as set up the software program.

Observe the directions within the GUI to finish the set up course of. Present the listing path you created as the house listing for GoldenGate.

After the GoldenGate software program set up is full, you possibly can create the GoldenGate processes that learn the information from the supply. First, you configure OGG EXTRACT.

Create an extract parameter file for the supply Oracle database. The next code is the pattern file content material:

[oracle@hostname Disk1]$vi eabc.prm

-- Extract group identify
EXTRACT EABC
SETENV (TNS_ADMIN = "/u01/app/oracle/product/19.3.0/community/admin")

-- Extract database consumer login

USERID ggs_admin@mydb, PASSWORD "********"

-- Native path on the distant host
EXTTRAIL /u01/app/oracle/product/OGG_DB_ORACLE/dirdat/ea
IGNOREREPLICATES
GETAPPLOPS
TRANLOGOPTIONS EXCLUDEUSER ggs_admin
TABLE scott.emp;

Add the EXTRACT on the GoldenGate immediate by working the next command:
```
GGSCI> ADD EXTRACT EABC, TRANLOG, BEGIN NOW
```
After you add the EXTRACT, examine the standing of the working packages with the information all

You will note the EXTRACT standing is within the STOPPED state, as proven within the following screenshot; that is anticipated.

Begin the EXTRACT course of as proven within the following determine.

The standing modifications to RUNNING. The next are the totally different statuses:

STARTING – The method is beginning.
RUNNING – The method has began and is working usually.
STOPPED – The method has stopped both usually (managed method) or attributable to an error.
ABENDED – The method has been stopped in an uncontrolled method. An irregular finish is named ABEND.

This can begin the extract course of and a path file will likely be created within the location talked about within the extract parameter file.

You’ll be able to confirm this through the use of the command stats <>, as proven within the following screenshot.

Set up GoldenGate for Large Knowledge 21c

On this step, we set up GoldenGate for Large Knowledge in the identical EC2 occasion the place we put in the GoldenGate Basic Structure.

Create a listing to put in the GoldenGate for Large Knowledge software program. To repeat the .zip file, observe these steps:

mkdir /u01/app/oracle/product/OGG_BIG_DATA

unzip 214000_ggs_Linux_x64_BigData_64bit.zip
tar -xvf ggs_Linux_x64_BigData_64bit.tar

GGSCI> CREATE SUBDIRS
GGSCI> EDIT PARAM MGR
PORT 7801

GGSCI> START MGR

This can begin the MANAGER program. Now you possibly can set up the dependencies required for the REPLICAT to run.

Go to /u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader and run the sh file with the most recent model of aws-java-sdk. This script downloads the AWS SDK, which offers consumer libraries for connectivity to the AWS Cloud.
```
[oracle@hostname DependencyDownloader]$ ./aws.sh 1.12.748
```

Configure the S3 handler

To configure an GoldenGate Replicat to ship knowledge to an S3 bucket, you’ll want to arrange a Replicat parameter file and properties file that defines how knowledge is dealt with and despatched to Amazon S3.

AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are the entry key and secret entry key of your IAM consumer, respectively. Don’t hardcode credentials or safety keys within the parameter and properties file. There are a number of strategies accessible to realize this, resembling the next:

#!/bin/bash

# Use atmosphere variables which might be already set within the OS
export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
export AWS_REGION="your_aws_region"

You’ll be able to set these atmosphere variables in your shell configuration file (e.g., .bashrc, .bash_profile, .zshrc) or use a safe methodology to set them briefly:

export AWS_ACCESS_KEY_ID="your_access_key_id"
export AWS_SECRET_ACCESS_KEY="your_secret_access_key"

Configure the properties file

Create a properties file for the S3 handler. This file defines how GoldenGate will work together together with your S3 bucket. Just be sure you have added the proper parameters as proven within the properties file.

The next code is an instance of an S3 handler properties file (dirprm/reps3.properties):

[oracle@hostname dirprm]$ cat reps3.properties
gg.handlerlist=filewriter

gg.handler.filewriter.sort=filewriter
gg.handler.filewriter.fileRollInterval=60s
gg.handler.filewriter.fileNameMappingTemplate=${tableName}${currentTimestamp}.json
gg.handler.filewriter.pathMappingTemplate=./dirout
gg.handler.filewriter.stateFileDirectory=./dirsta
gg.handler.filewriter.format=json
gg.handler.filewriter.finalizeAction=rename
gg.handler.filewriter.fileRenameMappingTemplate=${tableName}${currentTimestamp}.json
gg.handler.filewriter.eventHandler=s3

goldengate.userexit.writers=javawriter
#TODO Set S3 Occasion Handler- please replace as wanted
gg.eventhandler.s3.sort=s3
gg.eventhandler.s3.area=eu-west-1
gg.eventhandler.s3.bucketMappingTemplate=s3bucketname
gg.eventhandler.s3.pathMappingTemplate=${tableName}_${currentTimestamp}
gg.eventhandler.s3.accessKeyId=$AWS_ACCESS_KEY_ID
gg.eventhandler.s3.secretKey=$AWS_SECRET_ACCESS_KEY

gg.classpath=/u01/app/oracle/product/OGG_BIG_DATA/dirprm/:/u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader/dependencies/aws_sdk_1.12.748/
gg.log=log4j
gg.log.degree=DEBUG

#javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar -Daws.accessKeyId=my_access_key_id -Daws.secretKey=my_secret_key
javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar

Configure GoldenGate REPLICAT

Create the parameter file in /dirprm within the GoldenGate for Large Knowledge residence:

[oracle@hostname dirprm]$ vi rps3.prm
REPLICAT rps3
-- Command so as to add REPLICAT
-- add replicat fw, exttrail AdapterExamples/path/tr
SETENV(GGS_JAVAUSEREXIT_CONF = 'dirprm/rps3.props')
TARGETDB LIBFILE libggjava.so SET property=dirprm/rps3.props
REPORTCOUNT EVERY 1 MINUTES, RATE
MAP SCOTT.EMP, TARGET gg.handler.s3handler;;

[oracle@hostname OGG_BIG_DATA]$ ./ggsci
GGSCI > add replicat rps3, exttrail ./dirdat/tr/ea
Replicat added.

GGSCI > data all
Program Standing Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED RPS3 00:00:00 00:00:39

GGSCI > begin *
Sending START request to Supervisor ...
Replicat group RPS3 beginning.

Now you’ve gotten efficiently began the Replicat. You’ll be able to confirm this by working data and stats instructions adopted by the Replicat identify, as proven within the following screenshot.

To substantiate that the file has been replicated to an S3 bucket, open the Amazon S3 console and open the bucket you created. You’ll be able to see that the desk knowledge has been replicated to Amazon S3 in JSON file format.

Finest practices

Just be sure you are following the most effective practices on efficiency, compression, and safety.

Contemplate the next finest practices for efficiency:

The next are finest practices for compression:

Allow compression for path information to cut back storage necessities and enhance community switch efficiency.
Use GoldenGate’s built-in compression capabilities or use file system-level compression instruments.
Strike a steadiness between compression degree and CPU overhead, as a result of larger compression ranges could influence efficiency.

Lastly, when implementing Oracle GoldenGate for streaming knowledge to Amazon S3 for real-time analytics, it’s essential to deal with varied safety issues to guard your knowledge and infrastructure. Observe the safety finest practices for Amazon S3 and safety choices accessible for GoldenGate Basic Structure.

Clear up

To keep away from ongoing costs, delete the sources that you simply created as a part of this put up:

Take away the S3 bucket and path information if now not wanted and cease the GoldenGate processes on Amazon EC2.
Revert the modifications that you simply made within the database (resembling grants, supplemental logging, and archive log retention).
To delete all the setup, cease your EC2 occasion.

Conclusion

On this put up, we supplied a step-by-step information for putting in and configuring GoldenGate for Oracle Basic Structure and Large Knowledge for streaming knowledge from relational databases to Amazon S3. With these directions, you possibly can efficiently arrange an atmosphere and reap the benefits of the real-time analytics utilizing a GoldenGate handler for Amazon S3, which we’ll discover additional in an upcoming put up.

In case you have any feedback or questions, depart them within the feedback part.

Concerning the Authors

Prasad Matkar is Database Specialist Options Architect at AWS based mostly within the EMEA area. With a give attention to relational database engines, he offers technical help to prospects migrating and modernizing their database workloads to AWS.

Arun Sankaranarayanan is a Database Specialist Answer Architect based mostly in London, UK. With a give attention to purpose-built database engines, he assists prospects in migrating and modernizing their database workloads to AWS.

Giorgio Bonzi is a Sr. Database Specialist Options Architect at AWS based mostly within the EMEA area. With a give attention to relational database engines, he offers technical help to prospects migrating and modernizing their database workloads to AWS.