Friday, March 10, 2023
HomeBig DataCode conversion from Greenplum to Amazon Redshift: Dealing with arrays, dates, and...

Code conversion from Greenplum to Amazon Redshift: Dealing with arrays, dates, and common expressions


Amazon Redshift is a totally managed service for information lakes, information analytics, and information warehouses for startups, medium enterprises, and huge enterprises. Amazon Redshift is utilized by tens of hundreds of companies across the globe for modernizing their information analytics platform.

Greenplum is an open-source, massively parallel database used for analytics, largely for on-premises infrastructure. Greenplum relies on the PostgreSQL database engine.

Many shoppers have discovered migration to Amazon Redshift from Greenplum a horny possibility as an alternative of managing on-premises Greenplum for the next causes:

Regardless that each Greenplum and Amazon Redshift use the open-source PostgreSQL database engine, migration nonetheless requires plenty of planning and guide intervention. This publish covers the important thing features and concerns whereas performing code conversion from Greenplum to Amazon Redshift. It’s targeted on the migration of procedures, features, and views.

Resolution overview

AWS Database Migration Service (AWS DMS) and the AWS Schema Conversion Software (AWS SCT) can migrate a lot of the objects in a heterogeneous database migration from Greenplum to Amazon Redshift. However there are some conditions the place code conversion groups encounter errors and warnings for views, procedures, and features whereas creating them in Amazon Redshift. To deal with the sort of scenario, guide conversion of the code is required.

The posts focuses on the right way to deal with the next whereas migrating from Greenplum to Amazon Redshift:

  • Arrays
  • Dates and timestamps
  • Common expressions (regex)

Please notice that for this publish, we use Greenplum 4.3 and Amazon Redshift PostgreSQL 8.2.

Working with array features

The AWS SCT doesn’t convert array features whereas migrating from Greenplum or PostgreSQL to Amazon Redshift. Builders have to extensively convert these features manually. This publish outlines the most typical array features:

  • ARRAY_UPPER
  • JSON_EXTACT_ARRAY_ELEMENT_TEXT and JSON_ARRAY_LENGTH
  • UNNEST ()
  • STRING_AGG()
  • ANY ARRAY()

ARRAY_UPPER()

This operate returns the higher certain of an array. It may be used to extract the nth component from an array in PostgreSQL or Greenplum.

The Greenplum code is as follows:

With temp1 as
(
Choose 'John' as FirstName, 'Smith' as LastName ,
array['"111-222-3333"','"101-201-3001"','"XXX-YYY-ZZZZ"','NULL'] as PhoneNumbers
union all
Choose 'Bob' as FirstName, 'Haris' as LastName ,
array['222-333-4444','201-301-4001','AAA-BBB-CCCC'] as PhoneNumbers
union all
Choose 'Mary' as FirstName, 'Jane' as LastName ,
array['333-444-5555','301-401-3001','DDD-EEE-FFFF'] as PhoneNumbers
)
Choose Firstname, PhoneNumbers[ARRAY_UPPER(PhoneNumbers,1)]

There isn’t any operate to extract a component from an array in Amazon Redshift; nonetheless, there are two JSON features that can be utilized for this objective:

  • JSON_EXTRACT_ARRAY_ELEMENT_TEXT() – Returns a JSON array component within the outermost array of a JSON string
  • JSON_ARRAY_LENGTH() – Returns the variety of components within the outer array of a JSON string

See the next code:

With temp1 as
(
Choose 'John' as FirstName, 'Smith' as LastName ,
array['"111-222-3333"','"101-201-3001"','"XXX-YYY-ZZZZ"'] as PhoneNumbers
union all
Choose 'Bob' as FirstName, 'Haris' as LastName ,
array['"222-333-4444"','"201-301-4001"','"AAA-BBB-CCCC"'] as PhoneNumbers
union all
Choose 'Mary' as FirstName, 'Jane' as LastName ,
array['"333-444-5555"','"301-401-3001"','"DDD-EEE-FFFF"'] as PhoneNumbers
)

Choose
FirstName
,('['+array_to_string(phoneNumbers,',')+']') as JSONConvertedField
,JSON_EXTRACT_ARRAY_ELEMENT_TEXT
(
'['+array_to_string(phoneNumbers,',')+']'
,JSON_ARRAY_LENGTH('['+array_to_string(phoneNumbers,',')+']')-1
) as LastElementFromArray
from temp1

UNNEST()

UNNEST() is PostgreSQL’s system operate for semi-structured information, increasing an array, or a mixture of arrays to a set of rows. It’s launched to enhance the database efficiency of hundreds or information for inserts, updates, and deletes.

You should use UNNEST() for primary array, a number of arrays, and a number of arrays with completely different lengths.

A few of Amazon Redshift features used to unnest arrays are split_part, json_extract_path_text, json_array_length, and json_extract_array_element_text.

In Greenplum, the UNNEST operate is used to increase an array to a set of rows:

Choose ‘A’,unnest(array([1,2])

Output
A 1
A 2

with temp1 as
(
Choose 'John' as FirstName, 'Smith' as LastName ,
'111-222-3333' as Mobilephone,'101-201-3001' as HomePhone
union all
Choose 'Bob' as FirstName, 'Haris' as LastName ,
'222-333-4444' as Mobilephone,'201-301-4001' as HomePhone
union all
Choose 'Mary' as FirstName, 'Jane' as LastName ,
'333-444-5555' as Mobilephone,'301-401-3001' as HomePhone
)

choose
FirstName
,LastName
,unnest(array[‘Mobile’::text,’HomePhone’::text]) as PhoneType
,unnest(array[MobilePhone::text,HomePhone::text]) as PhoneNumber
from
temp1
order by 1,2,3

Amazon Redshift doesn’t assist the UNNEST operate; you should utilize the next workaround:

with temp1 as
(
Choose 'John' as FirstName, 'Smith' as LastName ,
'111-222-3333' as Mobilephone,'101-201-3001' as HomePhone
union all
Choose 'Bob' as FirstName, 'Haris' as LastName ,
'222-333-4444' as Mobilephone,'201-301-4001' as HomePhone
union all
Choose 'Mary' as FirstName, 'Jane' as LastName ,
'333-444-5555' as Mobilephone,'301-401-3001' as HomePhone
),
ns as
(
Choose row_number() over(order by 1) as n from pg_tables
)

Choose
FirstName
,LastName
,split_part('Cell,Residence',',',ns.n::int) as PhoneType
,split_part(MobilePhone|| '&&' || HomePhone, '&&', ns.n::int) as PhoneNumber
from
temp1, ns
the place
ns.n<=regexp_count('Cell,Residence',',')+1
order by 1,2,3

When the component of array is within the type of array itself, use the JSON_EXTRACT_ARRAY_ELEMENT_TEXT() operate and JSON_ARRAY_LENGTH:

with ns as
(
Choose row_number() over(order by 1) as n from pg_tables
)

Choose JSON_EXTRACT_ARRAY_ELEMENT_TEXT('["arrayelement1","arrayelement2"]',ns.n-1)
from ns
the place
ns.n<=JSON_ARRAY_LENGTH('["arrayelement1","arrayelement2"]')

STRING_AGG()

The STRING_AGG() operate is an combination operate that concatenates a listing of strings and locations a separator between them. The operate doesn’t add the separator on the finish of the string. See the next code:

STRING_AGG ( expression, separator [order_by_clause] )

The Greenplum code is as follows:

with temp1 as
(
Choose 'Finance'::textual content as Dept, 'John'::textual content as FirstName, 'Smith'::textual content as LastName
union all
Choose 'Finance'::textual content as Dept, 'John'::textual content as FirstName, 'Doe'::textual content as LastName
union all
Choose 'Finance'::textual content as Dept, 'Mary'::textual content as FirstName, 'Jane'::textual content as LastName
union all
Choose 'Advertising'::textual content as Dept, 'Bob'::textual content as FirstName, 'Smith'::textual content as LastName
union all
Choose 'Advertising'::textual content as Dept, 'Steve'::textual content as FirstName, 'Smith'::textual content as LastName
union all
Choose 'Account'::textual content as Dept, 'Phil'::textual content as FirstName, 'Adams'::textual content as LastName
union all
Choose 'Account'::textual content as Dept, 'Jim'::textual content as FirstName, 'Smith'::textual content as LastName
)

Choose dept,STRING_AGG(FirstName||' '||LastName,' ; ') as Workers from temp1 group by dept order by 1

The Amazon Redshift equal for the STRING_AGG() operate is LISTAGG(). This combination operate orders the rows for that group in accordance with the ORDER BY expression, then concatenates the values right into a single string:

LISTAGG(expression, separator [order_by_clause])

See the next code:

Create non permanent Desk temp1 as
(
Choose 'Finance'::textual content as Dept, 'John'::textual content as FirstName, 'Smith'::textual content as LastName
union all
Choose 'Finance'::textual content as Dept, 'John'::textual content as FirstName, 'Doe'::textual content as LastName
union all
Choose 'Finance'::textual content as Dept, 'Mary'::textual content as FirstName, 'Jane'::textual content as LastName
union all
Choose 'Advertising'::textual content as Dept, 'Bob'::textual content as FirstName, 'Smith'::textual content as LastName
union all
Choose 'Advertising'::textual content as Dept, 'Steve'::textual content as FirstName, 'Smith'::textual content as LastName
union all
Choose 'Account'::textual content as Dept, 'Phil'::textual content as FirstName, 'Adams'::textual content as LastName
union all
Choose 'Account'::textual content as Dept, 'Jim'::textual content as FirstName, 'Smith'::textual content as LastName
)

Choose dept,LISTAGG(FirstName||' '||LastName,' ; ') as Workers from temp1
group by dept
order by 1

ANY ARRAY()

The PostgreSQL ANY ARRAY() operate evaluates and evaluate the left-hand expression to every component in array:

Choose * from temp1 the place DeptName = ANY ARRAY('10-F','20-F','30-F')

In Amazon Redshift, the analysis could be achieved with an IN operator:

Choose * from temp1 the place DeptName IN ('10-F','20-F','30-F')

Working with date features

On this part, we focus on calculating the distinction between date_part for Greenplum and datediff for Amazon Redshift.

When the applying must calculate the distinction between the subfields of dates for Greenplum, it makes use of the operate date_part, which lets you retrieve subfields comparable to 12 months, month, week, and day. Within the following instance queries, we calculate the variety of completion_days by calculating the distinction between originated_date and eco_date.

To calculate the distinction between the subfields of the date, Amazon Redshift has the operate datediff. The next queries present an instance of the right way to calculate the completion_days because the distinction between eco_date and orginated_date. DATEDIFF determines the variety of date half boundaries which can be crossed between the 2 expressions.

We evaluate the Greenplum and Amazon Redshift queries as follows:

The next Greenplum question returns 1 12 months between 2009-01-01 and 2009-12-31:

SELECT date_part(‘12 months’, TIMESTAMP ‘2009-01-01’) - date_part(‘12 months’, 2008-12-31’) as 12 months;

The next Amazon Redshift question returns 1 12 months between 2009-01-01 and 2009-12-31:

SELECT datediff (12 months, ‘2008-12-31’ , ‘2009-01-01’ ) as 12 months;

The next Greenplum question returns 1 month between 2009-01-01 and 2008-12-31:

SELECT (date_part(‘12 months’, ‘2009-01-01’ :: date) - date_part(‘12 months’, ‘2008-12-31’ :: date)) * 12 +<br />(date_part(‘month’, ‘2009-01-01’) - date_part(‘month’, ‘2008-12-31’ :: date)) as month;

The next Amazon Redshift question returns 1 month between 2009-01-01 and 2008-12-31:

SELECT datediff( month, ‘2008-12-31’ , ‘2009-01-01’ ) as month;

The next Greenplum question returns 0 weeks between 2009-01-01 and 2009-12-31:

SELECT date_part(‘week’, timestamp ‘2009-01-01’ ) - date_part(‘week’, timestamp ‘2008-12-31’) as week;

The next Amazon Redshift question returns 0 weeks between 2009-01-01 and 2009-12-31:

SELECT datediff( week, ‘2008-12-31’ , ‘2009-01-01’ ) as week;

The next Greenplum question returns 1 day:

SELECT date_part ('day', '2009-01-01 24:00:00' :: timestamp - '2008-12-31 24:00:00 :: timestamp) as day;

The next Amazon Redshift question returns 1 day:

SELECT datediff (day, ‘2008-12-31’, ‘2009-01-01’) as day;

The next Greenplum question returns 1 hour:

SELECT date_part(‘hour’, ‘2009-01-01 22:56:10’ :: timestamp - ‘2008-12-31 21:54:55' :: timestamp)

The next Amazon Redshift question returns 1 hour:

SELECT datediff (hour, ‘2009-01-01 21:56:10’, ‘2009-01-01’ ) as hour;

The next Greenplum question returns 3 minutes:

SELECT date_part(‘minute’, ‘2009-01-01 22:56:10’ :: timestamp - ‘2009-01-01 21:53:10’ :: timestamp) as minutes;

The next Amazon Redshift question returns 1 minute:

SELECT datediff(minute, ‘2009-01-01 21:56:10’, ‘2009-01-01 21:57:55’) as minute;

The next Greenplum question returns 40 seconds:

SELECT date_part(‘second’, ‘2009-01-01 22:56:50’ :: timestamp - ‘2009-01-01 21:53:10’ : : timestamp) as seconds;

The next Amazon Redshift question returns 45 seconds:

SELECT datediff(second, ‘2009-01-01 21:56:10’, ‘2009-01-01- 21:56:55’) as seconds;

Now let’s have a look at how we use Amazon Redshift to calculate days and weeks in seconds.

The next Amazon Redshift question shows 2 days:

SELECT datediff(second, ‘2008-12-30 21:56:10’, ‘2009-01-01- 21:56:55’)/(60*60*24) as days;

The next Amazon Redshift question shows 9 weeks:

SELECT datediff(second, ‘2008-10-30 21:56:10’, ‘2009-01-01- 21:56:55’)/(60*60*24*7) as weeks;

For Greenplum, the date subfields have to be in single quotes, whereas for Amazon Redshift, we will use date subfields comparable to 12 months, month, week, day, minute, second with out quotes. For Greenplum, now we have to subtract the subfield from one half to a different half, whereas for Amazon Redshift we will use commas to separate the 2 dates.

Extract ISOYEAR from date

ISOYEAR 8601 is a week-numbering 12 months. It begins with the Monday of the week containing the 4th of January. So for the date of early January or late December, the ISO 12 months could also be completely different from the Gregorian 12 months. ISO 12 months has 52 or 53 full weeks (364 or 371 days). The additional week known as a leap week; a 12 months with such per week known as a intercalary year.

The next Greenplum question shows the ISOYEAR 2020:

SELECT extract (ISOYEAR from ‘2019-12-30’ :: date) as ISOYEARS;

The next Amazon Redshift question shows the ISOYEAR 2020:

SELECT to_char(‘2019-12-30’ :: date, ‘IYYYY’) as ISOYEARS;

Perform to generate_series()

Greenplum has adopted the PostgreSQL operate generate_series(). However the generate_series operate works in a different way with Amazon Redshift whereas retrieving information from the desk as a result of it’s a chief node-only operate.

To show a sequence of numbers in Amazon Redshift, run the next question on the chief node. On this instance, it shows 10 rows, numbered 1–10:

SELECT generate_series(1,10);

To show a sequence of days for a given date, use the next question. It extracts the day from the given date and subtracts 1, to show a sequence of numbers from 0–6:

SELECT generate_series(0, extract(day from date ‘2009-01-07’) :: int -1);

However for the queries fetching the report from the desk, becoming a member of with one other desk’s row, and processing information on the compute node, it doesn’t work, and generates an error message with Invalid Operation. The next code is an instance of a SQL assertion that works for Greenplum however fails for Amazon Redshift:

SELECT column_1,
FROM table_1t1
JOIN table_2 t2
ON t2.code = t1.code
CROSS JOIN generate_series(1,12) gen(fiscal_month)
WHERE condition_1

For Amazon Redshift, the answer is to create a desk to retailer the sequence information, and rewrite the code as follows:

SELECT column1,
FROM table_t1 t1
JOIN table_t2 t2
ON t2.code = t1.code
CROSS JOIN (choose “quantity” as fiscal_month FROM table_t3 WHERE “quantity”<=12) gen
WHERE condition_1

Working with common expressions (regex features)

Amazon Redshift and Greenplum each assist three situations for sample matching:

  • LIKE
  • SIMILAR TO
  • POSIX operators

On this publish, we don’t focus on all of those sample matching intimately. As an alternative, we focus on a couple of regex features and regex escape characters that aren’t supported by Amazon Redshift.

Regexp_split_to_table operate

The Regex_split_to_table operate splits a string utilizing a POSIX common expression sample as delimiter.

This operate has the next syntax:

Regexp_split_to_table(string,sample [,flags])

For Greenplum, we use the next question:

choose regexp_split_to_table ('bat,cat,hat',’,’) as regexp_split_table_GP

For Amazon Redshift, the regexp_split_to_table operate must be transformed utilizing the Amazon Redshift split_part operate:

SELECT column1,
FROM table_t1 t1
JOIN table_t2 t2
ON t2.code = t1.code
CROSS JOIN (choose “quantity” as fiscal_month FROM table_t3 WHERE “quantity”<=12) gen
WHERE condition_1

One other strategy to convert regexp_split_to_table is as follows:

SELECT column1,
FROM table_t1 t1
JOIN table_t2 t2
ON t2.code = t1.code
CROSS JOIN (choose “quantity” as fiscal_month FROM table_t3 WHERE “quantity”<=12) gen
WHERE condition_1

Substring from regex expressions

Substring (the string from the regex sample) extracts the substring or worth matching the sample that’s handed on. If there isn’t any match, null is returned. For extra info, consult with Sample Matching.

We use the next code in Greenplum:

create temp desk data1 ( col1 varchar );
insert into data1 values ('hellohowareyou 12687687abcd');
choose substring( col1 from '[A-Za-z]+$') from data1;
from data1

We will use the regexp_substr operate to transform this code to Amazon Redshift. It returns the characters extracted from a string by looking for an everyday expression sample. The syntax is as follows:

REGEXP_SUBSTR ( source_string, sample [, position [, occurrence [, parameters ] ] ] )

choose regexp_substr( col1, '[A-Za-z]+$') as substring_from_rs from data1

Key factors whereas changing common expression escapes

The Postgres escape character E doesn’t work in Amazon Redshift. Moreover, the next Greenplum common expression constraints aren’t supported in Amazon Redshift:

  • m – Matches solely originally of a phrase
  • y – Matches solely originally or finish of a phrase

For Amazon Redshift, use < and >, or [[:<:]] and [[:>:]] as an alternative.

Use the next code for Greenplum:

choose col1,
case
when (col1) ~ E 'm[0-9]{2}[A-Z]{1}[0-9]{1}' then
regexp_replace(col1, E '([0-9]{2})([A-Z]{1})([0-9]{1})',E '2')
else 'nothing'
finish as regex_test
from temp1123

Use the next code for Amazon Redshift:

choose col1,
case
when (col1) ~ '<[0-9]{2}[A-Z]{1}[0-9]{1}>' then
regexp_replace(col1,'([0-9]{2})([A-Z]{1})([0-9]{1})','2')
else 'nothing'
finish as regex_test
from temp1123

OR

choose col1,
case
when (col1) ~ '[[:<:]][0-9]{2}[A-Z]{1}[0-9]{1}[[:>:]]' then
regexp_replace(col1,'([0-9]{2})([A-Z]{1})([0-9]{1}) (.*)','2')
else 'nothing'
finish as regex_test
from temp1123

Conclusion

For heterogeneous database migration from Greenplum to the Amazon Redshift, you should utilize AWS DMS and the AWS SCT emigrate a lot of the database objects, comparable to tables, views, saved procedures, and features.

There are some conditions during which one operate is used for the supply setting, and the goal setting doesn’t assist the identical operate. On this case, guide conversion is required to supply the identical outcomes set and full the database migration.

In some circumstances, use of a brand new window operate supported by the goal setting proves extra environment friendly for analytical queries to course of petabytes of information.

This publish included a number of conditions the place guide code conversion is required, which additionally improves the code effectivity and make queries environment friendly.

When you’ve got any questions or ideas, please share your suggestions.


Concerning the Authors

Jagrit Shrestha is a Database advisor at Amazon Net Companies (AWS). He works as a database specialist serving to prospects migrate their on-premises database workloads to AWS and supply technical steerage.

Ishwar Adhikary is a Database Marketing consultant at Amazon Net Companies (AWS). He works intently with prospects to modernize their database and utility infrastructures. His focus space is migration of relational databases from On-premise information heart to AWS Cloud.

Shrenik Parekh works as a Database Consultants at Amazon Net Companies (AWS). He’s experience in database migration evaluation, database migration, modernizing database setting with purpose-built database utilizing AWS cloud database providers. He’s additionally targeted on AWS net providers for information analytics. In his spare time, he loves mountaineering, yoga and different out of doors actions.

Santhosh Meenhallimath is a Knowledge Architect at AWS. He works on constructing analytical options, constructing information lakes and migrate Database into AWS.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments

situs slot gacor provider terbaik agen toto slot terpercaya 2023 agen toto togel terpercaya 2023 situs toto togel pasaran resmi terbaik bandar toto macau pasaran resmi toto togel bandar toto slot gacor 4d 2023 bo togel online pasaran terlengkap sepanjang masa bo toto slot terlengkap sepanjang masa situs toto togel 2023 bet 100 perak daftar toto slot dan toto togel 2023 bermain toto togel dengan bet hanya 100 perak daftar toto slot bonus new member terpercaya bermain toto slot pelayanan 24 jam nonstop agen slot gacor 4d hadiah terbesar bandar toto slot provider terbaik toto slot gacor 4d hingga toto togel toto togel pasaran resmi terpercaya bo togel online terbaik 2023 agen togel online terbesar 2023 situs togel online terpercaya 2023 bo togel online paling resmi 2023 toto togel pasaran togel hongkong resmi situs slot online pasti gacor agen slot online anti rungkad bo slot online deposit tanpa potongan situs toto togel dan toto slot bonus new member situs toto slot gacor 4d bo toto slot gacor 4d bo toto slot gacor dari toto togel 4d bo toto slot 4d terpercaya bo toto slot terpercaya toto macau resmi dari toto togel 4d agen togel terbesar dan situs toto slot terpercaya bandar toto togel dan slot online 2023 bo slot gacor terbaik sepanjang masa winsortoto winsortoto bo toto togel situs toto situs toto togel terpercaya situs toto slot terpercaya situs slot gacor 4d terbaik sepanjang masa agen toto togel dan situs toto slot terpercaya situs toto togel dan agen toto slot terpercaya bandar toto togel tersedia pasaran toto macau resmi agen toto togel bet 100 perak deposit 10rb ltdtoto