Amazon Redshift is a totally managed service for information lakes, information analytics, and information warehouses for startups, medium enterprises, and huge enterprises. Amazon Redshift is utilized by tens of hundreds of companies across the globe for modernizing their information analytics platform.
Greenplum is an open-source, massively parallel database used for analytics, largely for on-premises infrastructure. Greenplum relies on the PostgreSQL database engine.
Many shoppers have discovered migration to Amazon Redshift from Greenplum a horny possibility as an alternative of managing on-premises Greenplum for the next causes:
Regardless that each Greenplum and Amazon Redshift use the open-source PostgreSQL database engine, migration nonetheless requires plenty of planning and guide intervention. This publish covers the important thing features and concerns whereas performing code conversion from Greenplum to Amazon Redshift. It’s targeted on the migration of procedures, features, and views.
Resolution overview
AWS Database Migration Service (AWS DMS) and the AWS Schema Conversion Software (AWS SCT) can migrate a lot of the objects in a heterogeneous database migration from Greenplum to Amazon Redshift. However there are some conditions the place code conversion groups encounter errors and warnings for views, procedures, and features whereas creating them in Amazon Redshift. To deal with the sort of scenario, guide conversion of the code is required.
The posts focuses on the right way to deal with the next whereas migrating from Greenplum to Amazon Redshift:
- Arrays
- Dates and timestamps
- Common expressions (regex)
Please notice that for this publish, we use Greenplum 4.3 and Amazon Redshift PostgreSQL 8.2.
Working with array features
The AWS SCT doesn’t convert array features whereas migrating from Greenplum or PostgreSQL to Amazon Redshift. Builders have to extensively convert these features manually. This publish outlines the most typical array features:
- ARRAY_UPPER
- JSON_EXTACT_ARRAY_ELEMENT_TEXT and JSON_ARRAY_LENGTH
- UNNEST ()
- STRING_AGG()
- ANY ARRAY()
ARRAY_UPPER()
This operate returns the higher certain of an array. It may be used to extract the nth component from an array in PostgreSQL or Greenplum.
The Greenplum code is as follows:
There isn’t any operate to extract a component from an array in Amazon Redshift; nonetheless, there are two JSON features that can be utilized for this objective:
- JSON_EXTRACT_ARRAY_ELEMENT_TEXT() – Returns a JSON array component within the outermost array of a JSON string
- JSON_ARRAY_LENGTH() – Returns the variety of components within the outer array of a JSON string
See the next code:
UNNEST()
UNNEST() is PostgreSQL’s system operate for semi-structured information, increasing an array, or a mixture of arrays to a set of rows. It’s launched to enhance the database efficiency of hundreds or information for inserts, updates, and deletes.
You should use UNNEST() for primary array, a number of arrays, and a number of arrays with completely different lengths.
A few of Amazon Redshift features used to unnest arrays are split_part
, json_extract_path_text
, json_array_length
, and json_extract_array_element_text
.
In Greenplum, the UNNEST operate is used to increase an array to a set of rows:
Output
A 1
A 2
Amazon Redshift doesn’t assist the UNNEST operate; you should utilize the next workaround:
STRING_AGG()
The STRING_AGG() operate is an combination operate that concatenates a listing of strings and locations a separator between them. The operate doesn’t add the separator on the finish of the string. See the next code:
The Greenplum code is as follows:
The Amazon Redshift equal for the STRING_AGG() operate is LISTAGG(). This combination operate orders the rows for that group in accordance with the ORDER BY expression, then concatenates the values right into a single string:
See the next code:
ANY ARRAY()
The PostgreSQL ANY ARRAY() operate evaluates and evaluate the left-hand expression to every component in array:
In Amazon Redshift, the analysis could be achieved with an IN operator:
Working with date features
On this part, we focus on calculating the distinction between date_part
for Greenplum and datediff for Amazon Redshift.
When the applying must calculate the distinction between the subfields of dates for Greenplum, it makes use of the operate date_part, which lets you retrieve subfields comparable to 12 months, month, week, and day. Within the following instance queries, we calculate the variety of completion_days
by calculating the distinction between originated_date
and eco_date
.
To calculate the distinction between the subfields of the date, Amazon Redshift has the operate datediff. The next queries present an instance of the right way to calculate the completion_days
because the distinction between eco_date
and orginated_date
. DATEDIFF determines the variety of date half boundaries which can be crossed between the 2 expressions.
We evaluate the Greenplum and Amazon Redshift queries as follows:
The next Greenplum question returns 1 12 months between 2009-01-01 and 2009-12-31:
The next Amazon Redshift question returns 1 12 months between 2009-01-01 and 2009-12-31:
The next Greenplum question returns 1 month between 2009-01-01 and 2008-12-31:
The next Amazon Redshift question returns 1 month between 2009-01-01 and 2008-12-31:
The next Greenplum question returns 0 weeks between 2009-01-01 and 2009-12-31:
The next Amazon Redshift question returns 0 weeks between 2009-01-01 and 2009-12-31:
The next Greenplum question returns 1 day:
The next Amazon Redshift question returns 1 day:
The next Greenplum question returns 1 hour:
The next Amazon Redshift question returns 1 hour:
The next Greenplum question returns 3 minutes:
The next Amazon Redshift question returns 1 minute:
The next Greenplum question returns 40 seconds:
The next Amazon Redshift question returns 45 seconds:
Now let’s have a look at how we use Amazon Redshift to calculate days and weeks in seconds.
The next Amazon Redshift question shows 2 days:
The next Amazon Redshift question shows 9 weeks:
For Greenplum, the date subfields have to be in single quotes, whereas for Amazon Redshift, we will use date subfields comparable to 12 months, month, week, day, minute, second with out quotes. For Greenplum, now we have to subtract the subfield from one half to a different half, whereas for Amazon Redshift we will use commas to separate the 2 dates.
Extract ISOYEAR from date
ISOYEAR 8601 is a week-numbering 12 months. It begins with the Monday of the week containing the 4th of January. So for the date of early January or late December, the ISO 12 months could also be completely different from the Gregorian 12 months. ISO 12 months has 52 or 53 full weeks (364 or 371 days). The additional week known as a leap week; a 12 months with such per week known as a intercalary year.
The next Greenplum question shows the ISOYEAR 2020:
The next Amazon Redshift question shows the ISOYEAR 2020:
Perform to generate_series()
Greenplum has adopted the PostgreSQL operate generate_series()
. However the generate_series
operate works in a different way with Amazon Redshift whereas retrieving information from the desk as a result of it’s a chief node-only operate.
To show a sequence of numbers in Amazon Redshift, run the next question on the chief node. On this instance, it shows 10 rows, numbered 1–10:
To show a sequence of days for a given date, use the next question. It extracts the day from the given date and subtracts 1, to show a sequence of numbers from 0–6:
However for the queries fetching the report from the desk, becoming a member of with one other desk’s row, and processing information on the compute node, it doesn’t work, and generates an error message with Invalid Operation. The next code is an instance of a SQL assertion that works for Greenplum however fails for Amazon Redshift:
For Amazon Redshift, the answer is to create a desk to retailer the sequence information, and rewrite the code as follows:
Working with common expressions (regex features)
Amazon Redshift and Greenplum each assist three situations for sample matching:
- LIKE
- SIMILAR TO
- POSIX operators
On this publish, we don’t focus on all of those sample matching intimately. As an alternative, we focus on a couple of regex features and regex escape characters that aren’t supported by Amazon Redshift.
Regexp_split_to_table operate
The Regex_split_to_table operate splits a string utilizing a POSIX common expression sample as delimiter.
This operate has the next syntax:
For Greenplum, we use the next question:
For Amazon Redshift, the regexp_split_to_table
operate must be transformed utilizing the Amazon Redshift split_part
operate:
One other strategy to convert regexp_split_to_table
is as follows:
Substring from regex expressions
Substring (the string from the regex sample) extracts the substring or worth matching the sample that’s handed on. If there isn’t any match, null is returned. For extra info, consult with Sample Matching.
We use the next code in Greenplum:
We will use the regexp_substr operate to transform this code to Amazon Redshift. It returns the characters extracted from a string by looking for an everyday expression sample. The syntax is as follows:
Key factors whereas changing common expression escapes
The Postgres escape character E doesn’t work in Amazon Redshift. Moreover, the next Greenplum common expression constraints aren’t supported in Amazon Redshift:
- m – Matches solely originally of a phrase
- y – Matches solely originally or finish of a phrase
For Amazon Redshift, use < and >, or [[:<:]] and [[:>:]] as an alternative.
Use the next code for Greenplum:
Use the next code for Amazon Redshift:
OR
Conclusion
For heterogeneous database migration from Greenplum to the Amazon Redshift, you should utilize AWS DMS and the AWS SCT emigrate a lot of the database objects, comparable to tables, views, saved procedures, and features.
There are some conditions during which one operate is used for the supply setting, and the goal setting doesn’t assist the identical operate. On this case, guide conversion is required to supply the identical outcomes set and full the database migration.
In some circumstances, use of a brand new window operate supported by the goal setting proves extra environment friendly for analytical queries to course of petabytes of information.
This publish included a number of conditions the place guide code conversion is required, which additionally improves the code effectivity and make queries environment friendly.
When you’ve got any questions or ideas, please share your suggestions.
Concerning the Authors
Jagrit Shrestha is a Database advisor at Amazon Net Companies (AWS). He works as a database specialist serving to prospects migrate their on-premises database workloads to AWS and supply technical steerage.
Ishwar Adhikary is a Database Marketing consultant at Amazon Net Companies (AWS). He works intently with prospects to modernize their database and utility infrastructures. His focus space is migration of relational databases from On-premise information heart to AWS Cloud.
Shrenik Parekh works as a Database Consultants at Amazon Net Companies (AWS). He’s experience in database migration evaluation, database migration, modernizing database setting with purpose-built database utilizing AWS cloud database providers. He’s additionally targeted on AWS net providers for information analytics. In his spare time, he loves mountaineering, yoga and different out of doors actions.
Santhosh Meenhallimath is a Knowledge Architect at AWS. He works on constructing analytical options, constructing information lakes and migrate Database into AWS.