Most firms have adopted a various set of software program as a service (SaaS) platforms to assist varied functions. The speedy adoption has enabled them to shortly streamline operations, improve collaboration, and acquire extra accessible, scalable options for managing their crucial knowledge and workflows.
Extra firms have realized there is a chance to combine, improve, and current this SaaS knowledge to enhance inner operations and acquire useful insights on their knowledge. Utilizing AWS Glue, a serverless knowledge integration service, firms can streamline this course of, integrating knowledge from inner and exterior sources right into a centralized AWS knowledge lake. From there, they’ll carry out significant analytics, acquire useful insights, and optionally push enriched knowledge again to exterior SaaS platforms.
This publish introduces the new HubSpot managed connector for AWS Glue, and demonstrates how one can combine HubSpot knowledge into your present knowledge lake on AWS. By consolidating HubSpot knowledge with knowledge out of your AWS accounts and from different SaaS providers, you may improve, analyze, and optionally write the info again to HubSpot, making a seamless and built-in knowledge expertise.
Resolution overview
On this instance, we use AWS Glue to extract, rework, and cargo (ETL) knowledge out of your HubSpot account right into a transactional knowledge lake on Amazon Easy Storage Service (Amazon S3), utilizing Apache Iceberg format. We register the schema within the AWS Glue Information Catalog to make your knowledge discoverable. Subsequently, we use Amazon Athena to validate that the HubSpot knowledge has been efficiently loaded to Amazon S3. The next diagram illustrates the answer structure.
The next are key elements and steps within the integration:
- Configure your HubSpot account and app to allow entry to your HubSpot knowledge.
- Put together for knowledge motion by securely storing your HubSpot OAuth credentials in AWS Secrets and techniques Supervisor, creating an S3 bucket to retailer your ingested knowledge, and creating an AWS Id and Entry Administration (IAM) position for AWS Glue.
- Create an AWS Glue job to extract and cargo knowledge from HubSpot to Amazon S3. AWS Glue establishes a safe connection to HubSpot utilizing OAuth for authorization and TLS for knowledge encryption in transit. AWS Glue additionally helps the flexibility to use advanced knowledge transformations, enabling environment friendly knowledge integration and preparation to fulfill your wants.
- Schema and different metadata might be registered within the AWS Glue Information Catalog, a centralized metadata repository for all of your knowledge property. This helps simplify schema administration, and in addition makes the info discoverable by different providers.
- Run the AWS Glue job to extract knowledge from HubSpot and write it to Amazon S3 utilizing Iceberg format. Apache Iceberg is an open supply, high-performance open desk format designed for large-scale analytics, offering transactional consistency and seamless schema evolution. Though we use Iceberg on this instance, AWS Glue provides sturdy assist for varied knowledge codecs, together with different transactional codecs corresponding to Apache Hudi and Delta Lake.
- The info loaded to Amazon S3 might be organized into partitioned folders to optimize for question efficiency and administration. Amazon S3 can even retailer the AWS Glue scripts, logs, and different short-term knowledge required throughout the ETL course of.
- Lastly, Amazon Athena might be used to question the info loaded from HubSpot to Amazon S3, validating that every one modifications within the supply system have been captured efficiently.
- Optionally, HubSpot can recurrently synchronize HubSpot knowledge to Amazon S3 and analyze knowledge updates over time.
Arrange your HubSpot account
This instance requires you to create a HubSpot public app for AWS Glue in a HubSpot Developer account, and join it to an related HubSpot account. A HubSpot public app is a sort of integration that may be put in in your HubSpot accounts or listed within the HubSpot Market. On this instance, you create a HubSpot app for the AWS Glue integration, and set up it in a brand new take a look at account. Though HubSpot calls it a public app, it is not going to be listed of their Market and can solely have entry to your take a look at account.
- When you don’t have already got one, join a free HubSpot developer account.
- Log in to your HubSpot developer account, the place you’ll see choices to create apps and take a look at accounts.
- Select Create a take a look at account and comply with the directions.
HubSpot take a look at accounts have Enterprise variations of the HubSpot Advertising, Gross sales, and Service Hubs together with pattern knowledge, so you may take a look at most HubSpot instruments, create CRM knowledge, and entry it via APIs with Glue. For extra details about making a take a look at account, check with Create a developer take a look at account.
Create a HubSpot app
Full the next steps to create a HubSpot app:
- Swap again to your HubSpot developer account, and select Create an app.
- Fill within the App Information part with the identify AWS Glue and a quick description.
- Select the Auth tab.
- For Redirect URLs, enter the redirect URL for AWS Glue within the type:
https://
..console.aws.amazon.com/gluestudio/oauth
Make sure to substitute
together with your AWS Glue working AWS Area. For example, the code for the US East (N. Virginia) Area is us-east-1, so the AWS Glue redirect URL is https://us-east-1.console.aws.amazon.com/gluestudio/oauth
.
- Within the Scopes part, select Add new scope and choose the next permissions:
- automation
- content material
- crm.lists.learn
- crm.lists.write
- crm.objects.firms.learn
- crm.objects.firms.write
- crm.objects.contacts.learn
- crm.objects.contacts.write
- crm.objects.customized.learn
- crm.objects.customized.write
- crm.objects.offers.learn
- crm.objects.offers.write
- crm.objects.house owners.learn
- crm.schemas.customized.learn
- e-commerce
- varieties
- oauth
- sales-email-read
- tickets
- Evaluation the Scopes and Redirect URL settings, then select Create app.
- Navigate again to your app Auth tab.
- Pay attention to the values for Shopper ID, Shopper secret, and Set up URL (OAuth). You will have these later to attach your AWS Glue occasion.
Choose or create an Amazon S3 bucket the place your HubSpot knowledge will reside
Choose an present Amazon S3 bucket in your account, or create a brand new bucket to retailer your HubSpot knowledge, in addition to scripts, logs, and so forth. For this instance, the bucket identify will comply with the format aws-glue-hubspot-
, the place
is the AWS account quantity and
is the working Area. The account might be configured with all defaults: public entry disabled, versioning disabled, and server-side encryption with Amazon S3 managed keys (SSE-S3).
When you use AWSGlueServiceRole in your IAM position as proven on this instance, it can present entry to S3 buckets with names beginning with aws-glue-
.
Create an IAM position for AWS Glue
Create an IAM position with permissions for the AWS Glue job. AWS Glue will assume this position when calling different providers in your behalf.
- On the IAM console, select Roles within the navigation pane.
- Select Create position.
- For Trusted entity sort¸ select AWS service.
- For Use case, select Glue.
- Add the next AWS managed insurance policies to the position:
- AWSGlueServiceRole for accessing associated providers corresponding to Amazon S3, Amazon Elastic Compute Cloud, Amazon CloudWatch, and IAM. This coverage permits entry to S3 buckets with names beginning with
aws-glue-
. - SecretsManagerReadWrite for learn/write entry to AWS Secrets and techniques Supervisor.
- AWSGlueServiceRole for accessing associated providers corresponding to Amazon S3, Amazon Elastic Compute Cloud, Amazon CloudWatch, and IAM. This coverage permits entry to S3 buckets with names beginning with
- Give the position a reputation, as an illustration
AWSGlueServiceRole_blog
.
For extra info, see Getting began with AWS Glue and Create an IAM position for AWS Glue.
Create a AWS Secrets and techniques Supervisor secret
AWS Secrets and techniques Supervisor is used to securely retailer your HubSpot OAuth credentials. Full the next steps to create a secret:
- On the AWS Secrets and techniques Supervisor console, select Secrets and techniques within the navigation pane.
- Select Retailer a brand new secret.
- For Secret sort, choose Different sort of secret.
- Underneath Kay/worth pairs, enter the HubSpot shopper secret with the important thing
USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET
. - Select Subsequent.
- Enter the key identify, corresponding to
HubSpot-Weblog
, an outline, and proceed. - Depart the key rotation as default, and select Subsequent.
- Evaluation the key configuration, and select Retailer.
Create an AWS Glue connection
Full the next steps to create an AWS Glue connection to your HubSpot account:
- On the AWS Glue console, select Information connections within the navigation pane.
- Select Create connection.
- For Information sources, seek for and choose HubSpot.
- Select Subsequent.
- On the Configure connection web page, fill within the required info:
- For IAM service position, select the service position created beforehand. On this instance, we use the position
AWSGlueServiceRole_blog
. - For Authentication URL, depart as default.
- For Consumer Managed Shopper Utility ClientId, enter the OAuth shopper ID from HubSpot.
- For AWS Secret, select the OAuth shopper secret identify configured beforehand in AWS Secrets and techniques Supervisor.
- Select Subsequent.
- For IAM service position, select the service position created beforehand. On this instance, we use the position
- Select Check Connection to validate the connection to HubSpot.
- This can deliver up a brand new HubSpot connection window. Make sure to choose your HubSpot take a look at account (not your developer account) to check the connection.
- If that is your first connection try, you may be redirected to a different web page the place you might be requested to verify the entry stage granted to AWS Glue. Select Join App.
If profitable, the HubSpot window will shut and your AWS connection window will say Connection take a look at profitable.
- Underneath Set properties, for Title, enter a reputation (for instance,
HubSpot_Connection_blog
). - Select Subsequent.
- Underneath Evaluation and create, overview your settings after which create the connection.
Create a database in AWS Glue Information Catalog
Full the next steps to create a database in AWS Glue Information Catalog to prepare your HubSpot knowledge:
- On the AWS Glue console, select Databases within the navigation pane.
- Create a brand new database.
- Enter a reputation (for instance,
hubspot
). - You may depart the situation area clean.
- Select Create database.
Create an AWS Glue ETL job
Now that you’ve got an AWS Glue knowledge connection to your HubSpot account, you may create an AWS Glue ETL job to ingest HubSpot knowledge into your AWS knowledge lake. AWS Glue offers each visible and code-based interfaces to simplify knowledge integration, relying in your experience. On this instance, we use the Script interface to ingest HubSpot knowledge into the Amazon S3 location. Full the next steps:
- On the AWS Glue console, select ETL jobs within the navigation pane.
- Select the Script editor.
- Select Spark because the engine, and add the next script.
The AWS Glue Spark job reads the HubSpot knowledge and merges it into the S3 bucket in Iceberg format.
- On the Job particulars tab, present the next info:
- For Title, enter a reputation, corresponding to
HubSpot_to_S3_blog
. - For Description, enter a significant description of the job.
- For IAM Function, select the IAM position you created beforehand (for this publish,
AWSGlueServiceRole_blog
).
- Develop Superior properties.
- Underneath Connections, enter your HubSpot connection from the earlier part (for this publish,
HubSpot_Connection_blog
).
- Underneath Job parameters, enter the next parameters:
-
- For
--conf
, enterspark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.glue_catalog.warehouse=file:///tmp/spark-warehouse
- For
--datalake-formats
, entericeberg
- For
--db_name
, enter the AWS Glue database to retailer your knowledge lake (for this publish,hubspot
) - For
--table_name
, enter the HubSpot desk to be ingested (for this publish,firm
) - For
--s3_bucket_name
, enter the place the ingested Iceberg desk is saved, on this caseaws-glue-hubspot-
- - For
--connection_name
, enter the AWS Glue connection identify created, on this caseHubSpot_Connection_blog
- For
- Select Save to avoid wasting the job, then select Run.
Relying on the quantity of knowledge in your HubSpot account, the job can take a couple of minutes to finish. After a profitable job run, you may select Run particulars to see the job specs and logs.
Use Athena to question knowledge
Athena is an interactive and serverless question service that makes it simple to research knowledge instantly in Amazon S3 utilizing normal SQL. On this instance, we question the outcomes of the HubSpot knowledge ingested into Amazon S3.
- On the Athena console, select Question editor.
- For Database, select
hubspot
, and it is best to see yourfirm
desk. - Choose entries from the
hubspot.firm
desk to view the info captured fromhubspot
.
You may strive varied queries on the HubSpot knowledge, corresponding to:
Over time, your HubSpot knowledge might change. You may rerun your ETL job periodically, and the Iceberg knowledge lake desk will successfully seize your modifications. You may confirm by including, eradicating, and altering firms in your HubSpot database, after which rerun the ETL job. Your knowledge lake ought to match your newest HubSpot knowledge. With this functionality, you may schedule the ETL job to run as usually as you want.
Extending the HubSpot connector with AWS providers
The HubSpot connector for AWS Glue offers a robust basis for constructing complete knowledge pipelines and analytics workflows. By integrating HubSpot knowledge into your AWS surroundings, you should utilize extra providers like Amazon Redshift, Amazon QuickSight, and Amazon SageMaker to additional course of, rework, and analyze the info. This lets you assemble refined, end-to-end knowledge architectures that unlock the complete worth of your HubSpot knowledge, with out the necessity to handle advanced infrastructure. The seamless integration between these AWS providers makes it simple to construct scalable analytics pipelines tailor-made to your particular necessities.
Concerns
You may arrange AWS Glue job triggers to run the ETL jobs on a schedule, in order that the info is recurrently synchronized between HubSpot and Amazon S3. It’s also possible to combine the ETL jobs with different AWS providers, together with AWS Step Features, Amazon MWAA (Amazon Managed Workflows for Apache Airflow), AWS Lambda, Amazon EventBridge , and Amazon Bedrock to create a extra superior knowledge processing pipeline.
By default, the HubSpot connector doesn’t import deleted information. Nevertheless, you may set the IMPORT_DELETED_RECORDS
choice to true to import all information, together with the deleted ones.
Clear up
To keep away from incurring expenses, clear up the sources used on this publish out of your AWS account, together with the AWS Glue jobs, HubSpot connection, AWS Secrets and techniques Supervisor secret, IAM position, and Amazon S3 bucket.
Conclusion
With the introduction of the AWS Glue connector for HubSpot, integrating HubSpot knowledge with info from different knowledge sources has change into extra streamlined than ever. This function allows you to arrange ongoing knowledge integration from HubSpot to AWS, offering a unified view of knowledge from throughout platforms and enabling extra complete analytics. The serverless nature of AWS Glue means there isn’t any infrastructure administration required, and also you solely pay for the sources consumed. By following the steps outlined on this publish, you may guarantee that up-to-date knowledge from HubSpot is captured within the your knowledge lake, permitting groups to make sooner data-driven choices and uncover advanced insights from throughout knowledge sources.
To be taught extra in regards to the AWS Glue connector for HubSpot, check with Connecting to HubSpot in AWS Glue. This information walks via all the course of, from organising the connection to operating the info switch movement. For extra info on AWS Glue, go to AWS Glue.
Concerning the Authors
Eric Bomarsi is a Senior Options Architect within the ISV group at AWS, the place he focuses on constructing scalable options for big clients. As a member of the AWS analytics group, he helps clients get strategic insights from their knowledge. Outdoors of labor, he enjoys enjoying ice hockey and touring along with his household.
Annie Nelson is a Senior Options Architect at AWS. She is a knowledge fanatic who enjoys drawback fixing and tackling advanced architectural challenges with clients.
Kartikay Khator is a Options Architect inside International Life Sciences at AWS, the place he dedicates his efforts to creating progressive and scalable options that cater to the evolving wants of consumers. His experience lies in harnessing the capabilities of AWS analytics providers. Extending past his skilled pursuits, he finds pleasure and success on this planet of operating and climbing. Having already accomplished a number of marathons, he’s at the moment making ready for his subsequent marathon problem.
Kamen Sharlandjiev is a Sr. Large Information and ETL Options Architect, Amazon MWAA and AWS Glue ETL professional. He’s on a mission to make life simpler for purchasers who’re dealing with advanced knowledge integration and orchestration challenges. His secret weapon? Totally managed AWS providers that may get the job accomplished with minimal effort. Comply with Kamen on LinkedIn to maintain updated with the most recent Amazon MWAA and AWS Glue options and information!