©2019 by Raghavendra Kambhampati

How to generate TPC-H data in AWS EC2 Linux & upload to AWS S3?

Updated: Aug 16, 2019

TPC-H is a Decision Support BenchmarkThe TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size.

TPC-H comes with various data set sizes to test different scaling factors.

SF (Gigabytes)Size1Consists of the base row size (several million elements).10Consists of the base row size x 10.100Consists of the base row size x 100 (several hundred million elements).1000Consists of the base row size x 1000 (several billion elements).


To Summarize TPC-H database :-

The database consists of a 3rd Normal Form (3NF) schema consisting of 8 tables.

The benchmarks can be run using pre-determined database sizes, referred to as “scale factors”. Each scale factor corresponds to the raw data size of the data warehouse.

6 of the 8 tables grow linearly with the scale factor and are populated with data that is uniformly distributed.

22 complex and long running query templates and 2 data refresh processes (insert and delete) are run in parallel to test concurrency.

The number of concurrent processes increases with the scale factor – for example, for the 100 TB benchmark you run 11 concurrent processes.



Database Entities, Relationships, and Characteristics

The components of TPC-H consist of eight separate and individual tables (the Base Tables). The relationships between columns in these tables are illustrated in the following ER diagram:

The parentheses following each table name contain the prefix of the column names for that table;

The arrows point in the direction of the one-to-many relationships between tables;

The number/formula below each table name represents the cardinality (number of rows) of the table. Some are factored by SF, the Scale Factor, to obtain the chosen database size. The cardinality for the LINEITEM table is approximate.

The schema consists of 8 tables, 8 explicit unique indexes supporting 8 primary keys and 9 explicit indexes supporting 9 foreign keys


TPC-H WorkloadTypeFeaturesTPCH Bussines QuestionsA

Medum dimensionality

Result is TPCH scale factor independentQ1, Q3, Q4, Q5, Q6, Q7, Q8, Q12, Q13, Q14, Q16, Q19, Q22B

High dimensionality

Few results, lot of empty cellsQ15, Q18C

High dimensionality

Result % of scale factorQ2, Q9, Q10, Q11, Q17, Q20, Q21


Steps to Generate 100GB TPC-H Data in AWS EC2 Linux and Upload it to AWS S3:

Sign up on the TPC website (http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp)to download the TPC-H tool to generate the data set. Download the zip file.

Login to AWS Console(don’t login with root account) with the credentials provided or created->Click on Services->EC2->Launch Instance and you will see below screen->Select Amazon Linux 2 AMI.

Select t2.micro(free tier eligible)->Click on Next Configure Instance Details.

Select VPC where you want to launch your EC2 instance(Select Default VPC or Custom VPC)->Select Public Subnet->Enable Auto-Assign Public IP->Click Next Add Storage.

Select EBS Root Volume and enter Size(GB) (depends on the TPC-H dataset we really need for)->Enter 100GB->Click Next Add Tags

Click on Key and Enter Name and Value as TPC-H Data(You can add any Value)->Click Next Configure Security Group.

Select Existing Security Group or Create New Security Group->Select SSH as Type->Source Select My IP(Never Put 0.0.0.0/0 as it’s not the best practice to enable access to everyone)->Click Review and Launch.

Review all details and Click Launch->Select Existing Key pair of your own created one’s(which is required for launching EC2 instance) or Create New Key pair.

Click on View Instances to see the status of your EC2 instance.Once the status has changed to Running ,login to the EC2 instance using the Public DNS or Public IP through putty.

After logging into the EC2 instance, we need to copy the TPC-H downloaded software in Step 1 and upload it to EC2 instance.(We can use Winscp free software for this activity).Enter the EC2 Public DNS hostname and ec2-user as the username which is standard for all Linux instances in AWS and Click on Advanced.

After Clicking on Advanced->Select Authentication under SSH->Point to the location where you have stored the .ppk key file->Click OK.

Click on Yes.

Select the downloaded TPC-H software and drag it to tmp directory in the target EC2 instance and then ->right click on the file and change the permissions as shown in the below images.


After the software has been copied successfully,login to EC2 instance using ec2-user and follow the below steps.

[ec2-user@ip-10-0-0-228 ~]$ cd /tmp
[ec2-user@ip-10-0-0-228 ~]$ cp 7a7a0fca-3c10-4ec0-9fd1-6ed4acf3f3a4-tpc-h-tool.zip /home/ec2-user
[ec2-user@ip-10-0-0-228 ~]$ cd /home/ec2-user
[ec2-user@ip-10-0-0-228 ~]$ unzip 7a7a0fca-3c10-4ec0-9fd1-6ed4acf3f3a4-tpc-h-tool.zip
[ec2-user@ip-10-0-0-228 ~]$ ls -ltr
total 25316
drwxrwxr-x 5 ec2-user ec2-user      121 Sep 27  2017 2.17.3
-rwxrwxr-x 1 ec2-user ec2-user 25920568 Nov  1 05:56 7a7a0fca-3c10-4ec0-9fd1-6ed4acf3f3a4-tpc-h-tool.zip
[ec2-user@ip-10-0-0-228 ~]$ sudo yum group install Development Tools

Installed:
  autoconf.noarch 0:2.69-11.amzn2           automake.noarch 0:1.13.4-3.1.amzn2      bison.x86_64 0:3.0.4-6.amzn2.0.2          byacc.x86_64 0:1.9.20130304-3.amzn2.0.2
  cscope.x86_64 0:15.8-10.amzn2.0.2         ctags.x86_64 0:5.8-13.amzn2.0.2         diffstat.x86_64 0:1.57-4.amzn2.0.2        doxygen.x86_64 1:1.8.5-3.amzn2.0.2
  elfutils.x86_64 0:0.170-4.amzn2           flex.x86_64 0:2.5.37-3.amzn2.0.3        gcc.x86_64 0:7.3.1-5.amzn2.0.2            gcc-c++.x86_64 0:7.3.1-5.amzn2.0.2
  gcc-gfortran.x86_64 0:7.3.1-5.amzn2.0.2   git.x86_64 0:2.14.5-1.amzn2             indent.x86_64 0:2.2.11-13.amzn2.0.2       intltool.noarch 0:0.50.2-7.amzn2
  libtool.x86_64 0:2.4.2-22.2.amzn2.0.2     patch.x86_64 0:2.7.1-10.amzn2.0.2       patchutils.x86_64 0:0.3.3-4.amzn2.0.1     rcs.x86_64 0:5.9.0-5.amzn2.0.2
  rpm-build.x86_64 0:4.11.3-25.amzn2.0.3    rpm-sign.x86_64 0:4.11.3-25.amzn2.0.3   subversion.x86_64 0:1.7.14-11.amzn2.0.2   swig.x86_64 0:3.0.12-11.amzn2.0.3
  systemtap.x86_64 0:3.2-4.amzn2.0.2
Dependency Installed:
apr.x86_64 0:1.6.3-5.amzn2.0.2 apr-util.x86_64 0:1.6.1-5.amzn2.0.2 apr-util-bdb.x86_64 0:1.6.1-5.amzn2.0.2
avahi-libs.x86_64 0:0.6.31-19.amzn2 cpp.x86_64 0:7.3.1-5.amzn2.0.2 dwz.x86_64 0:0.11-3.amzn2.0.2
efivar-libs.x86_64 0:31-4.amzn2.0.4 elfutils-libelf-devel.x86_64 0:0.170-4.amzn2 emacs-filesystem.noarch 1:25.3-3.amzn2.0.1
gdb.x86_64 0:8.0.1-30.amzn2.0.3 gettext-common-devel.noarch 0:0.19.8.1-2.amzn2.0.2 gettext-devel.x86_64 0:0.19.8.1-2.amzn2.0.2
git-core.x86_64 0:2.14.5-1.amzn2 git-core-doc.x86_64 0:2.14.5-1.amzn2 glibc-devel.x86_64 0:2.26-28.amzn2.0.1
glibc-headers.x86_64 0:2.26-28.amzn2.0.1 gnutls.x86_64 0:3.3.26-9.amzn2.0.2 kernel-devel.x86_64 0:4.14.72-73.55.amzn2
kernel-headers.x86_64 0:4.14.72-73.55.amzn2 libatomic.x86_64 0:7.3.1-5.amzn2.0.2 libcilkrts.x86_64 0:7.3.1-5.amzn2.0.2
libgfortran.x86_64 0:7.3.1-5.amzn2.0.2 libitm.x86_64 0:7.3.1-5.amzn2.0.2 libmodman.x86_64 0:2.0.1-8.amzn2.0.2
libmpc.x86_64 0:1.0.1-3.amzn2.0.2 libmpx.x86_64 0:7.3.1-5.amzn2.0.2 libproxy.x86_64 0:0.4.11-10.amzn2.0.3
libquadmath.x86_64 0:7.3.1-5.amzn2.0.2 libsanitizer.x86_64 0:7.3.1-5.amzn2.0.2 libsecret.x86_64 0:0.18.5-2.amzn2.0.2
m4.x86_64 0:1.4.16-10.amzn2.0.2 mokutil.x86_64 1:0.3.0-10.amzn2.0.1 mpfr.x86_64 0:3.1.1-4.amzn2.0.2
neon.x86_64 0:0.30.0-3.amzn2.0.2 nettle.x86_64 0:2.7.1-8.amzn2.0.2 pakchois.x86_64 0:0.4-10.amzn2.0.2
perl-Data-Dumper.x86_64 0:2.145-3.amzn2.0.2 perl-Error.noarch 1:0.17020-2.amzn2 perl-Git.noarch 0:2.14.5-1.amzn2
perl-TermReadKey.x86_64 0:2.30-20.amzn2.0.2 perl-Test-Harness.noarch 0:3.28-3.amzn2 perl-Thread-Queue.noarch 0:3.02-2.amzn2
perl-XML-Parser.x86_64 0:2.41-10.amzn2.0.2 perl-srpm-macros.noarch 0:1-8.amzn2.0.1 subversion-libs.x86_64 0:1.7.14-11.amzn2.0.2
system-rpm-config.noarch 0:9.1.0-76.amzn2.0.8 systemtap-client.x86_64 0:3.2-4.amzn2.0.2 systemtap-devel.x86_64 0:3.2-4.amzn2.0.2
trousers.x86_64 0:0.3.14-2.amzn2.0.2 zlib-devel.x86_64 0:1.2.7-17.amzn2.0.2
Complete!
[ec2-user@ip-10-0-0-228 ~]$ cd 2.17.3 (dbgen software directory after unzipping file)
[ec2-user@ip-10-0-0-228 2.17.3]$ ls -ltr
total 2984
-rw-rw-r-- 1 ec2-user ec2-user 17809 Sep 20 2017 EULA.txt
-rw-rw-r-- 1 ec2-user ec2-user 2422117 Sep 20 2017 tpc-ds_v2.17.3.pdf
-rw-rw-r-- 1 ec2-user ec2-user 602880 Sep 20 2017 tpc-ds_v2.17.3.docx
drwxrwxr-x 10 ec2-user ec2-user 101 Sep 27 2017 ref_data
drwxrwxr-x 2 ec2-user ec2-user 34 Sep 27 2017 dev-tools
drwxrwxr-x 8 ec2-user ec2-user 4096 Sep 27 2017 dbgen
[ec2-user@ip-10-0-0-228 2.17.3]$ cd dbgen
[ec2-user@ip-10-0-0-228 dbgen]$ ls -ltr
total 360
-rw-rw-r-- 1 ec2-user ec2-user 15399 Sep 20 2017 dss.h
-rw-rw-r-- 1 ec2-user ec2-user 12160 Sep 20 2017 varsub.c
-rw-rw-r-- 1 ec2-user ec2-user 430 Sep 20 2017 update_release.sh
-rw-rw-r-- 1 ec2-user ec2-user 4929 Sep 20 2017 tpch.vcproj
-rw-rw-r-- 1 ec2-user ec2-user 1317 Sep 20 2017 tpch.sln
-rw-rw-r-- 1 ec2-user ec2-user 725 Sep 20 2017 tpch.dsw
-rw-rw-r-- 1 ec2-user ec2-user 3817 Sep 20 2017 tpcd.h
-rw-rw-r-- 1 ec2-user ec2-user 8413 Sep 20 2017 text.c
-rw-rw-r-- 1 ec2-user ec2-user 6623 Sep 20 2017 speed_seed.c
-rw-rw-r-- 1 ec2-user ec2-user 1761 Sep 20 2017 shared.h
-rw-rw-r-- 1 ec2-user ec2-user 619 Sep 20 2017 rng64.h
-rw-rw-r-- 1 ec2-user ec2-user 3788 Sep 20 2017 rng64.c
-rw-rw-r-- 1 ec2-user ec2-user 4612 Sep 20 2017 rnd.h
-rw-rw-r-- 1 ec2-user ec2-user 5243 Sep 20 2017 rnd.c
-rw-rw-r-- 1 ec2-user ec2-user 96 Sep 20 2017 release.h
-rw-rw-r-- 1 ec2-user ec2-user 17617 Sep 20 2017 README
-rw-rw-r-- 1 ec2-user ec2-user 4916 Sep 20 2017 qgen.vcproj
-rw-rw-r-- 1 ec2-user ec2-user 14404 Sep 20 2017 qgen.c
-rw-rw-r-- 1 ec2-user ec2-user 9582 Sep 20 2017 print.c
-rw-rw-r-- 1 ec2-user ec2-user 9176 Sep 20 2017 PORTING.NOTES
-rw-rw-r-- 1 ec2-user ec2-user 3357 Sep 20 2017 permute.h
-rw-rw-r-- 1 ec2-user ec2-user 3685 Sep 20 2017 permute.c
-rw-rw-r-- 1 ec2-user ec2-user 6360 Sep 20 2017 makefile.suite
-rw-rw-r-- 1 ec2-user ec2-user 4377 Sep 20 2017 load_stub.c
-rw-rw-r-- 1 ec2-user ec2-user 23726 Sep 20 2017 HISTORY
-rw-rw-r-- 1 ec2-user ec2-user 5127 Sep 20 2017 dsstypes.h
-rw-rw-r-- 1 ec2-user ec2-user 2072 Sep 20 2017 dss.ri
-rw-rw-r-- 1 ec2-user ec2-user 3814 Sep 20 2017 dss.ddl
-rw-rw-r-- 1 ec2-user ec2-user 20158 Sep 20 2017 driver.c
-rw-rw-r-- 1 ec2-user ec2-user 11815 Sep 20 2017 dists.dss
-rw-rw-r-- 1 ec2-user ec2-user 5154 Sep 20 2017 dbgen.dsp
-rw-rw-r-- 1 ec2-user ec2-user 6358 Sep 20 2017 config.h
-rw-rw-r-- 1 ec2-user ec2-user 166 Sep 20 2017 column_split.sh
-rw-rw-r-- 1 ec2-user ec2-user 11413 Sep 20 2017 build.c
-rw-rw-r-- 1 ec2-user ec2-user 27872 Sep 20 2017 BUGS
-rw-rw-r-- 1 ec2-user ec2-user 13632 Sep 20 2017 bm_utils.c
-rw-rw-r-- 1 ec2-user ec2-user 859 Sep 20 2017 bcd2.h
-rw-rw-r-- 1 ec2-user ec2-user 6072 Sep 20 2017 bcd2.c
drwxrwxr-x 2 ec2-user ec2-user 80 Sep 27 2017 variants
drwxrwxr-x 2 ec2-user ec2-user 137 Sep 27 2017 tests
drwxrwxr-x 2 ec2-user ec2-user 4096 Sep 27 2017 reference
drwxrwxr-x 2 ec2-user ec2-user 305 Sep 27 2017 queries
drwxrwxr-x 2 ec2-user ec2-user 92 Sep 27 2017 check_answers
drwxrwxr-x 2 ec2-user ec2-user 327 Sep 27 2017 answers
[ec2-user@ip-10-0-0-228 dbgen]$ cp makefile.suite makefile
[ec2-user@ip-10-0-0-228 dbgen]$ ls -ltr
total 368
-rw-rw-r-- 1 ec2-user ec2-user 15399 Sep 20 2017 dss.h
-rw-rw-r-- 1 ec2-user ec2-user 12160 Sep 20 2017 varsub.c
-rw-rw-r-- 1 ec2-user ec2-user 430 Sep 20 2017 update_release.sh
-rw-rw-r-- 1 ec2-user ec2-user 4929 Sep 20 2017 tpch.vcproj
-rw-rw-r-- 1 ec2-user ec2-user 1317 Sep 20 2017 tpch.sln
-rw-rw-r-- 1 ec2-user ec2-user 725 Sep 20 2017 tpch.dsw
-rw-rw-r-- 1 ec2-user ec2-user 3817 Sep 20 2017 tpcd.h
-rw-rw-r-- 1 ec2-user ec2-user 8413 Sep 20 2017 text.c
-rw-rw-r-- 1 ec2-user ec2-user 6623 Sep 20 2017 speed_seed.c
-rw-rw-r-- 1 ec2-user ec2-user 1761 Sep 20 2017 shared.h
-rw-rw-r-- 1 ec2-user ec2-user 619 Sep 20 2017 rng64.h
-rw-rw-r-- 1 ec2-user ec2-user 3788 Sep 20 2017 rng64.c
-rw-rw-r-- 1 ec2-user ec2-user 4612 Sep 20 2017 rnd.h
-rw-rw-r-- 1 ec2-user ec2-user 5243 Sep 20 2017 rnd.c
-rw-rw-r-- 1 ec2-user ec2-user 96 Sep 20 2017 release.h
-rw-rw-r-- 1 ec2-user ec2-user 17617 Sep 20 2017 README
-rw-rw-r-- 1 ec2-user ec2-user 4916 Sep 20 2017 qgen.vcproj
-rw-rw-r-- 1 ec2-user ec2-user 14404 Sep 20 2017 qgen.c
-rw-rw-r-- 1 ec2-user ec2-user 9582 Sep 20 2017 print.c
-rw-rw-r-- 1 ec2-user ec2-user 9176 Sep 20 2017 PORTING.NOTES
-rw-rw-r-- 1 ec2-user ec2-user 3357 Sep 20 2017 permute.h
-rw-rw-r-- 1 ec2-user ec2-user 3685 Sep 20 2017 permute.c
-rw-rw-r-- 1 ec2-user ec2-user 6360 Sep 20 2017 makefile.suite
-rw-rw-r-- 1 ec2-user ec2-user 4377 Sep 20 2017 load_stub.c
-rw-rw-r-- 1 ec2-user ec2-user 23726 Sep 20 2017 HISTORY
-rw-rw-r-- 1 ec2-user ec2-user 5127 Sep 20 2017 dsstypes.h
-rw-rw-r-- 1 ec2-user ec2-user 2072 Sep 20 2017 dss.ri
-rw-rw-r-- 1 ec2-user ec2-user 3814 Sep 20 2017 dss.ddl
-rw-rw-r-- 1 ec2-user ec2-user 20158 Sep 20 2017 driver.c
-rw-rw-r-- 1 ec2-user ec2-user 11815 Sep 20 2017 dists.dss
-rw-rw-r-- 1 ec2-user ec2-user 5154 Sep 20 2017 dbgen.dsp
-rw-rw-r-- 1 ec2-user ec2-user 6358 Sep 20 2017 config.h
-rw-rw-r-- 1 ec2-user ec2-user 166 Sep 20 2017 column_split.sh
-rw-rw-r-- 1 ec2-user ec2-user 11413 Sep 20 2017 build.c
-rw-rw-r-- 1 ec2-user ec2-user 27872 Sep 20 2017 BUGS
-rw-rw-r-- 1 ec2-user ec2-user 13632 Sep 20 2017 bm_utils.c
-rw-rw-r-- 1 ec2-user ec2-user 859 Sep 20 2017 bcd2.h
-rw-rw-r-- 1 ec2-user ec2-user 6072 Sep 20 2017 bcd2.c
drwxrwxr-x 2 ec2-user ec2-user 80 Sep 27 2017 variants
drwxrwxr-x 2 ec2-user ec2-user 137 Sep 27 2017 tests
drwxrwxr-x 2 ec2-user ec2-user 4096 Sep 27 2017 reference
drwxrwxr-x 2 ec2-user ec2-user 305 Sep 27 2017 queries
drwxrwxr-x 2 ec2-user ec2-user 92 Sep 27 2017 check_answers
drwxrwxr-x 2 ec2-user ec2-user 327 Sep 27 2017 answers
-rw-rw-r-- 1 ec2-user ec2-user 6360 Nov 1 06:10 makefile
[ec2-user@ip-10-0-0-228 dbgen]$ vi makefile (Enter the below values as shown in screenshot to complile dbgen utility)
CC = gcc
DATABASE = ORACLE
MACHINE = LINUX
WORKLOAD = TPCH

NOTE:The above command generates 100GB data for all 8 tables based for a scale factor 100 database. If we want to generate 1 TB then the scaling factor should be 1024. TPC-H runs are only compliant when run against SF’s of 1, 10, 100, 300, 1000, 3000, 10000, 30000, 100000.

If we want to split 100GB for 8 tables in multiple files we can run the below command:

./dbgen -vf -s 100 -S 1 -C 10(–s 100 specifies that we are using a scale factor of 100 meaning that we are generating approximately 100GB of benchmark data. –S 1 instructs dbgen to generate first of 10 chunks. –C 10 is the total number of files for each large dataset (excluding nation and region tables).

TPC-H Population Generator (Version 2.17.0)

Copyright Transaction Processing Performance Council 1994 - 2010

Generating data for suppliers table/

Preloading text ... 100%

done.

Generating data for customers tabledone.

Generating data for orders/lineitem tablesdone.

Generating data for part/partsupplier tablesdone.

Generating data for nation tabledone.

Generating data for region tabledone.

You will see the data files in the tpch217_0/dbgen directory.

These are basic instructions but should make it much easier than reading the TPC-H docs to figure out how to run the dbgen utility.Let me know if you have questions.

Number of rows based on the based on the Scaling factor:

Here we selected scaling factor as 100 to generate 100GB data

Table (rows)TPCH Scale (GB)lineitemorderspartsupppartcustomersupplier1060M15M8M2M1.5M0.1M20120M30M16M4M3M0.2M50300M75M40M10M7.5M0.5M100600M150M80M20M15M1M2001.2B300M160M40M30M2M10006B1.5B800M200M150M10M


Upload TPC-H 100GB Data to AWS S3 from EC2:

As we have generated 100GB TPC-H data using dbgen utility in AWS EC2,we will compress all those 8 with gzip so that they will consume minimum disk space and optimize read I/O in case of CPU power abundance.

[ec2-user@ip-10-0-0-228 dbgen]$ gzip partsupp.tbl

[ec2-user@ip-10-0-0-228 dbgen]$ gzip supplier.tbl

[ec2-user@ip-10-0-0-228 dbgen]$ gzip region.tbl

[ec2-user@ip-10-0-0-228 dbgen]$ gzip nation.tbl

[ec2-user@ip-10-0-0-228 dbgen]$ gzip customer.tbl

[ec2-user@ip-10-0-0-228 dbgen]$ gzip lineitem.tbl

[ec2-user@ip-10-0-0-228 dbgen]$ gzip orders.tbl

[ec2-user@ip-10-0-0-228 dbgen]$ gzip part.tbl

After running all the above commands for all the 8 files we will see the format getting changed as below:

Since we have used Amazon Linux AMI 2,it has AWSCLI pre-installed with all dependent packages.
We just need to configure awscli with user access aws_access_key_id and aws_secret_access_key for copying data from EC2 to AWS S3.
Prior copying the data to AWS S3, we need to create buckets and folders for all 8 tables.
======================Run the below command to verify awscli setup========================================
[root@ip-10-0-0-228 ~]# aws --version
aws-cli/1.15.80 Python/2.7.14 Linux/4.14.72-73.55.amzn2.x86_64 botocore/1.10.79
======================Run the below command to configure awscli===========================================
[root@ip-10-0-0-228 ~]# aws configure
AWS Access Key ID [None]: ################
AWS Secret Access Key [None]: ###########################
Default region name [None]: ap-south-1
Default output format [None]:
======================Run below commands to verify awscli configuration===================================
[root@ip-10-0-0-228 ~]# ls .aws/
config  credentials
[root@ip-10-0-0-228 ~]# cat .aws/config 
[default]
region = ap-south-1
[root@ip-10-0-0-228 ~]# cat .aws/credentials
[default]
aws_access_key_id = ##############
aws_secret_access_key = ######################
======================S3 commands to copy TPC-H gzip data to AWS S3=========================================
aws s3 cp /home/ec2-user/2.17.3/dbgen/supplier.tbl.gz s3://redshift-demokloud/supplier/csv/
aws s3 cp /home/ec2-user/2.17.3/dbgen/customer.tbl.gz s3://redshift-demokloud/customer/csv/
aws s3 cp /home/ec2-user/2.17.3/dbgen/orders.tbl.gz s3://redshift-demokloud/order/csv/
aws s3 cp /home/ec2-user/2.17.3/dbgen/lineitem.tbl.gz s3://redshift-demokloud/lineitem/csv/
aws s3 cp /home/ec2-user/2.17.3/dbgen/partsupplier.tbl.gz s3://redshift-demokloud/partsupplier/csv/
aws s3 cp /home/ec2-user/2.17.3/dbgen/part.tbl.gz s3://redshift-demokloud/part/csv/
aws s3 cp /home/ec2-user/2.17.3/dbgen/nation.tbl.gz s3://redshift-demokloud/nation/csv/
aws s3 cp /home/ec2-user/2.17.3/dbgen/region.tbl.gz s3://redshift-demokloud/region/csv/
============================================================================================================
Data can be verified by checking the S3 buckets and folders.
This is how we copy data from EC2 to S3.


33 views