Creating Hive schema for frequently changing metadata structure for SAS table
I have a SAS master dataset which gets appended every month and have corresponding columns added for those values. Such like below columns
Name Address Column_201809 Column_201810 Column_201811
Can you please suggest how i should handle this schema changes when writing this data back to Hadoop.
hadoop hive sas hdfs
|
show 1 more comment
I have a SAS master dataset which gets appended every month and have corresponding columns added for those values. Such like below columns
Name Address Column_201809 Column_201810 Column_201811
Can you please suggest how i should handle this schema changes when writing this data back to Hadoop.
hadoop hive sas hdfs
Do you have to keep adding extra columns? This would be much easier to deal with if you hadyyyymmas an extra column and appended new rows to your table instead.
– user667489
Nov 28 '18 at 10:13
Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.
– P.Sharma
Nov 28 '18 at 10:15
1
In that case you would just need to change the constraint so that each combination of date + id is unique.
– user667489
Nov 28 '18 at 10:40
That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.
– P.Sharma
Nov 28 '18 at 10:53
That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?
– Richard
Nov 28 '18 at 13:03
|
show 1 more comment
I have a SAS master dataset which gets appended every month and have corresponding columns added for those values. Such like below columns
Name Address Column_201809 Column_201810 Column_201811
Can you please suggest how i should handle this schema changes when writing this data back to Hadoop.
hadoop hive sas hdfs
I have a SAS master dataset which gets appended every month and have corresponding columns added for those values. Such like below columns
Name Address Column_201809 Column_201810 Column_201811
Can you please suggest how i should handle this schema changes when writing this data back to Hadoop.
hadoop hive sas hdfs
hadoop hive sas hdfs
asked Nov 28 '18 at 9:07
P.SharmaP.Sharma
81
81
Do you have to keep adding extra columns? This would be much easier to deal with if you hadyyyymmas an extra column and appended new rows to your table instead.
– user667489
Nov 28 '18 at 10:13
Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.
– P.Sharma
Nov 28 '18 at 10:15
1
In that case you would just need to change the constraint so that each combination of date + id is unique.
– user667489
Nov 28 '18 at 10:40
That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.
– P.Sharma
Nov 28 '18 at 10:53
That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?
– Richard
Nov 28 '18 at 13:03
|
show 1 more comment
Do you have to keep adding extra columns? This would be much easier to deal with if you hadyyyymmas an extra column and appended new rows to your table instead.
– user667489
Nov 28 '18 at 10:13
Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.
– P.Sharma
Nov 28 '18 at 10:15
1
In that case you would just need to change the constraint so that each combination of date + id is unique.
– user667489
Nov 28 '18 at 10:40
That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.
– P.Sharma
Nov 28 '18 at 10:53
That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?
– Richard
Nov 28 '18 at 13:03
Do you have to keep adding extra columns? This would be much easier to deal with if you had
yyyymm as an extra column and appended new rows to your table instead.– user667489
Nov 28 '18 at 10:13
Do you have to keep adding extra columns? This would be much easier to deal with if you had
yyyymm as an extra column and appended new rows to your table instead.– user667489
Nov 28 '18 at 10:13
Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.
– P.Sharma
Nov 28 '18 at 10:15
Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.
– P.Sharma
Nov 28 '18 at 10:15
1
1
In that case you would just need to change the constraint so that each combination of date + id is unique.
– user667489
Nov 28 '18 at 10:40
In that case you would just need to change the constraint so that each combination of date + id is unique.
– user667489
Nov 28 '18 at 10:40
That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.
– P.Sharma
Nov 28 '18 at 10:53
That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.
– P.Sharma
Nov 28 '18 at 10:53
That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?
– Richard
Nov 28 '18 at 13:03
That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?
– Richard
Nov 28 '18 at 13:03
|
show 1 more comment
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53515742%2fcreating-hive-schema-for-frequently-changing-metadata-structure-for-sas-table%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53515742%2fcreating-hive-schema-for-frequently-changing-metadata-structure-for-sas-table%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Do you have to keep adding extra columns? This would be much easier to deal with if you had
yyyymmas an extra column and appended new rows to your table instead.– user667489
Nov 28 '18 at 10:13
Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.
– P.Sharma
Nov 28 '18 at 10:15
1
In that case you would just need to change the constraint so that each combination of date + id is unique.
– user667489
Nov 28 '18 at 10:40
That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.
– P.Sharma
Nov 28 '18 at 10:53
That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?
– Richard
Nov 28 '18 at 13:03