SAP is one of the most robust ERP sys­tems used in sec­tors such as fin­ance, sup­ply chain man­age­ment or man­u­fac­tur­ing. It is recog­nized for its abil­ity to optim­ize crit­ical busi­ness oper­a­tions. On the other hand, AWS is widely recog­nized as a strong and robust cloud space, offer­ing solu­tions for stor­age, ana­lysis and data processing.

Due to the strengths of these two plat­forms, it is nat­ural for com­pan­ies to seek the best of both worlds and estab­lish a con­nec­tion between them. Often, data stored in SAP needs to be trans­ferred to AWS for back-ups, machine learn­ing or some other advance ana­lyt­ics cap­ab­il­it­ies. Con­versely, the inform­a­tion pro­cessed in AWS is fre­quently needed back in SAP system.

In this post we will explain dif­fer­ent approaches to trans­fer data between these two platforms.

Using AWS Environment

Con­nect AWS and SAP HANA using Glue

This entrance will show how to cre­ate a con­nec­tion to SAP HANA tables using AWS Glue, how to read data from HANA tables and store the extrac­ted data in S3 bucket in CSV format and of course how to read a CSV file stored in S3 and write the data in SAP HANA.

AWS Architecture
Pre-requis­ites:
  • HANA Table created
  • S3 Bucket
    • Data in CSV format (only to import data to SAP HANA)
    • SAP Hana JDBC driver

Com­pon­ents Creation

Secrets Man­ager

Let’s start with stor­ing the HANA data for the con­nec­tion in AWS Secrets Manager

  • user
  • pass­word
  • url

*All this inform­a­tion should be provided by HANA system

Go to Secrets in the AWS con­sole and cre­ate a new Secret, select Other type of secret and add the keys with the cor­res­pond­ing value.

AWS Key Value pairs

Go to next page and make sure to give this Secret a descript­ive name, it is a good prac­tice to also add a descrip­tion, but this is totally optional.

AWS Secret name an description

IAM Role & policy

It is needed a role with access to the S3 bucket, Glue com­pon­ents and Secrets Man­ager data for the con­nec­tion. AWS roles work with policies that are cre­ated sep­ar­ately and attached after. It is recom­mend­able the cre­ation in this order so we can add the min­imum per­mis­sions needed for this pro­cess to work.

Policy cre­ation

Under policy, click on cre­ate policy and select Secrets Man­ager, allow the fol­low­ing access:

  • Get­Secret­Value

Under Resources, make sure to add the ARN of the Secrets we cre­ated previously.

Add more per­mis­sions at the bot­tom and select S3, allow the fol­low­ing access:

  • List­Bucket
  • GetO­b­ject
  • PutO­b­ject

Under Resources, make sure to add the ARN of the S3 Bucket where the data and driver con­nector are stored.

Add one more access for Glue and allow the fol­low­ing access:

  • Get­Job
  • Get­Jobs
  • Get­JobRun
  • Get­JobRuns
  • Start­JobRun
  • Get­Con­nec­tion
  • Get­Con­nec­tions

Under Resources, make sure to add the ARN.

In order to mon­itor our Job, we must add the per­mis­sion for logs. Add Cloud­Watch Logs with the fol­low­ing access:

  • Descri­beLogGroups
  • Cre­ate­LogGroup
  • Cre­ate­Log­Stream
  • Put­Lo­gEvents

Optional: You can use the pre­defined policy AWS­GlueSer­vice­R­ole instead, with this, the access to Glue and Cloud­Watch be allowed.

Role

To cre­ate the IAM Role, go to IAM in the AWS con­sole and select Roles on the left panel, click on Cre­ate Role and select the AWS Ser­vice option.

Then, in Use Case sec­tion, select Glue, since this role will be mainly used for Glue.

The next step is to add per­mis­sions to our Role, for this it is recom­men­ded to use spe­cific policies that includes the min­imum accesses that our Role need, you can select the policy if you have it already cre­ated, oth­er­wise you can just click on next and attach it once you cre­ated it.

In the next step we will be asked for the Name of this Role, here it is recom­men­ded to use a descript­ive name such as SAP-HANA-Data­Trans­fer-Role a good descrip­tion it is also recom­men­ded, to explain the pur­pose of the Role.

Glue

Glue is a flex­ible tool that will allow to con­nect to SAP HANA using dif­fer­ent meth­ods, first I would like to point out that Glue counts with Glue Con­nector fea­ture, which is an entity that con­tains data like user, url and pass­word, this can also be con­nec­ted to Secrets Man­ager to avoid expos­ing sens­it­ive data. It is pos­sible to sum­mon this data in the glue job to add a layer of abstrac­tion to the pro­ject. How­ever, this is not the only way to cre­ate a con­nec­tion, it is also pos­sible to cre­ate the con­nec­tion within the glue job using a JDBC driver, this is more trans­par­ent on how the con­nec­tion is made, which can be use­ful in some scenarios.

Another fea­ture of Glue is Glue Stu­dio Visual ETL, with this fea­ture you could develop an ETL pro­cess in a visual way, by using the drag and drop func­tion you can select pre­defined blocks divided in three cat­egor­ies: Source, Trans­form­a­tion and Target.

The tra­di­tional way to cre­ate a Glue Job is with the Script Editor option, by select­ing this option you get a text editor where you can code.

This doc­u­ment will present the fol­low­ing scenarios.

  • Glue Con­nec­tion + Glue Job using Visual ETL
  • Glue Con­nec­tion + Glue Job using Script Editor
  • Glue Job with embed­ded JDBC connection
Glue con­nec­tion

The first step is to cre­ate a con­nec­tion from AWS to SAP, for this we are going to use a Glue con­nector fea­ture. On the left in the AWS Glue con­sole, select Data con­nec­tions and then Cre­ate connection.

This will open a wiz­ard and will show dif­fer­ent options for data sources, select SAP HANA.

Provide the URL and choose the secret cre­ated in previously.

Give a descript­ive name to the con­nec­tion and cre­ate the resource.

Glue Job

A Glue Job will be our bridge to con­nect to SAP Hana, as men­tioned before there are dif­fer­ent ways to cre­ate a Glue Job using the con­sole, using Visual Editor and using Script Editor

Glue Con­nec­tion + Glue Job using Visual ETL

Export HANA Table to S3
  1. Cre­ate a Glue Job Visual ETL
  2. Select SAP HANA in Sources
  3. Select S3 in Targets
  4. Add Cus­tom Trans­form and select SAP HANA as Node Parent
  5. Add Select from Col­lec­tion and select Cus­tom Trans­form as Node Parent
  6. Con­fig­ure the source:
    1. On the right panel Data Source properties:
      1. Select the SAP HANA connection
      2. Provide the table to con­nect schema_name.table_name
    2. Under Data Pre­view select the IAM Role created
    A pre­view of the data con­tained in the table will be shown.Configure the tar­get:
    1. On the right panel Data Tar­get properties:
      1. select the format CSV
      2. Provide the S3 path where the CSV file is going to be stored
    To avoid the con­tent to split in mul­tiple files, a trans­form­a­tion and selec­tion are needed to con­dense everything into one single file.
  7. Con­firm that Amazon S3 has Select from col­lec­tion as par­ent node
Glue Job

Add the code to the trans­form­a­tion module:

def MyTransform (glueContext, dfc) -> DynamicFrameCollection:
df = dfc.select(list(dfc.keys())[0]).toDF()
df_coalesced = df.coalesce(1)
dynamic_frame = DynamicFrame.fromDF(df_coalesced, glueContext, "dynamic_frame")
return dynamic_frame

Save & Run

Import CSV file to HANA
  1. Cre­ate a Glue Job Visual ETL
  2. Select S3 in Sources
  3. Select SAP HANA in Targets
Glue Job
Con­fig­ure the source
  1. On the right panel Data Source properties:
    1. Provide the S3 URL where the csv file is stored s3://bucket_name/prefix/file
    2. Select CSV format and set the con­fig­ur­a­tion accord­ing to your file
  2. Under Data Pre­view select the IAM Role created
  3. A pre­view of the file con­tent will be shown
Con­fig­ure the target:
  1. On the right panel Data Sink properties:
    1. select Amazon S3 as Node Parent
    2. Provide the table to con­nect schema_name.table_name
      Note: If the table does not exist and your users has enough priv­ileges, the table will be created.

Save & Run

Infotext about this method
Glue Job using Script Editor
  1. Cre­ate a Glue Job Script editor
  2. Select Spark as Engine
  3. Select Start fresh and use the fol­low­ing code
Export HANA Table to S3 – Using Glue Connection

##### Read from HANA
####Option 1
df = glueContext.create_dynamic_frame.from_options(
connection_type="saphana",
connection_options={
"connectionName": "Name of the connection",
"dbtable": "schema.table",
}
)

####Option 2
df = glueContext.create_dynamic_frame.from_options(
connection_type="saphana",
connection_options={
"connectionName": "Name of the connection",
"query": "SELECT * FROM schema.table"
}
)

##Condense into one single file
df_coalesced = df.coalesce(1)

#####Write CSV in S3
glueContext.write_dynamic_frame.from_options(
frame=df_coalesced ,
connection_type="s3",
connection_options={"path": "s3://bucket/prefix/"},
format="csv",
format_options={
"quoteChar": -1,
},
)

Import CSV file to HANA – Using Glue Connection

Read CSV File

dynam­ic­Frame = glueContext.create_dynamic_frame.from_options(
connection_type=“s3”,
connection_options={“paths”: [“s3://bucket/prefix/file.csv”]},
format=“csv”,
format_options={ “with­Header”: True,
},
)

###Write into SAP HANA
glueContext.write_dynamic_frame.from_options(
frame=dynamicFrame,
connection_type="saphana",
connection_options={
"connectionName": "Connection_Name",
"dbtable": "schema.table"
},
)

Import/Export using JDBC Connection

####Export HANA Table to S3

df = glueContext.read.format("jdbc")
.option("driver", jdbc_driver_name)
.option("url", url)
.option("currentschema", schema)
.option("dbtable", table_name)
.option("user", username)
.option("password", password)
.load()
df_coalesced = df.coalesce(1) # to create only one file
df_coalesced.write.mode("overwrite")
.option("header", "true")
.option("quote", "\u0000")
.csv("s3://bucket_name/prefix/")

###Import CSV file to HANA

df2 = spark.read.csv(
"s3://bucket/prefix/file.csv",
header=True,
inferSchema=True)
# Write data in SAP HANA
df2.write.format("jdbc")
.option("driver", jdbc_driver_name)
.option("url", url)
.option("dbtable", f"{schema}.{table_name}")
.option("user", username)
.option("password", password)
.mode("append")
.save()

4. Go to Job Details

    1. Give a name
    2. Select the IAM Role
    3. Advance prop­er­ties
      1. Add the SAP Hana Connection
  1. Save & Run

Using HANA Environment

Export­ing HANA tables to AWS S3 

This doc­u­ment­a­tion will show an option to export data from SAP HANA tables to S3 stor­age using a CSV file.

Pre-requis­ites:
  • HANA Table
  • S3 Bucket
  • AWS Access key & Secrets Key
  • AWS Cer­ti­fic­ate
Con­fig­ure AWS Cer­ti­fic­ate in HANA

It is needed to add an AWS cer­ti­fic­ate as a trus­ted source in order to export the data.

-- 1. Create certificate store SSL

CREATE PSE SSL;

-- 2. Register S3 certificate

CREATE CERTIFICATE from 'AWS Certificate Content' COMMENT 'S3';

-- 3. Get the certificate ID

Select * from certificates where comment = 'S3';

-- 4. add S3 certificate to SSL certificate store

ALTER PSE SSL ADD CERTIFICATE CERTIFICATE_NUMBER; -- the certificate number is taken from select statement result in step 3.

SET PSE SSL PURPOSE REMOTE SOURCE;

Export Data – GUI Option

SAP soft­ware has the option to export data to sev­eral cloud sources by right-click­ing on the desired table to export.

It shows a wiz­ard to select:

  1. Data source: Schema and Table/View
  2. Export Tar­get: Amazon S3, Azure Stor­age, Alibaba cloud 
    1. Once we select Amazon S3, it will ask for the S3 Region and S3 Path. This inform­a­tion can be con­sul­ted in AWS Con­sole, in the Bucket where we want to store our expor­ted data.
    2. By click­ing on the Com­pose but­ton it’ll show a prompt to write the Access Key, Secret Key, Bucket Name and Object ID
Infotext: Object ID will be the name of the file
  1. Export options: It defines the format CSV/PARQUET and con­fig­ur­a­tion of the CSV file generated.

Finally we can export the file by click­ing on Export but­ton and it should be uploaded in our S3

Export Data – SQL Option

How­ever, we’ll get the same res­ult if we run from the SQL con­sole in SAP HANA the fol­low­ing query:

EXPORT INTO 's3-REGION://ACCESSKEY:SECRETKEY@BUCKET_NAME/FOLDER_NAME/file_name.csv' FROM SCHEMA_NAME.TABLE/VIEW_NAME;

It is pos­sible to import CSV files stored in S3 by using the IMPORT Statement

IMPORT file_name 's3-REGION://ACCESSKEY:SECRETKEY@BUCKET_NAME/FOLDER_NAME/file_name.csv';

Con­clu­sion

It is pos­sible to have the best of both worlds by using these solu­tions. The solu­tions men­tioned can be applied in dif­fer­ent cir­cum­stances, the pro­cess of export­ing data using SAP sys­tem is advis­able when a dir­ect con­nec­tion its needed to off­load data­sets for backups or data lake solu­tions. This allows to get a bet­ter cost con­trol since the stor­age in S3 can be eas­ily con­trolled with the dif­fer­ent types of stor­age tiers exit­ing in AWS.

On the other hand, using AWS Glue is con­veni­ent when an auto­mated pro­cess of the extrac­tion of the data is required, in addi­tion, Glue can trans­form and load the data into S3 where it can be pro­cessed for machine learn­ing or data warehousing.

In sum­mary, com­bin­ing SAP HANA’s export cap­ab­il­it­ies with AWS Glue’s data trans­form­a­tion tools enables effi­cient and scal­able data management.