Creating Hive schema for frequently changing metadata structure for SAS table

I have a SAS master dataset which gets appended every month and have corresponding columns added for those values. Such like below columns

Name Address Column_201809 Column_201810 Column_201811

Can you please suggest how i should handle this schema changes when writing this data back to Hadoop.

asked Nov 28 '18 at 9:07

P.Sharma

Do you have to keep adding extra columns? This would be much easier to deal with if you had yyyymm as an extra column and appended new rows to your table instead.

– user667489
Nov 28 '18 at 10:13

Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.

– P.Sharma
Nov 28 '18 at 10:15

1

In that case you would just need to change the constraint so that each combination of date + id is unique.

– user667489
Nov 28 '18 at 10:40

That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.

– P.Sharma
Nov 28 '18 at 10:53

That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?

– Richard
Nov 28 '18 at 13:03

|
show 1 more comment

I have a SAS master dataset which gets appended every month and have corresponding columns added for those values. Such like below columns

Name Address Column_201809 Column_201810 Column_201811

Can you please suggest how i should handle this schema changes when writing this data back to Hadoop.

asked Nov 28 '18 at 9:07

P.Sharma

Do you have to keep adding extra columns? This would be much easier to deal with if you had yyyymm as an extra column and appended new rows to your table instead.

– user667489
Nov 28 '18 at 10:13

Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.

– P.Sharma
Nov 28 '18 at 10:15

1

In that case you would just need to change the constraint so that each combination of date + id is unique.

– user667489
Nov 28 '18 at 10:40

That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.

– P.Sharma
Nov 28 '18 at 10:53

That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?

– Richard
Nov 28 '18 at 13:03

|
show 1 more comment

I have a SAS master dataset which gets appended every month and have corresponding columns added for those values. Such like below columns

Name Address Column_201809 Column_201810 Column_201811

Can you please suggest how i should handle this schema changes when writing this data back to Hadoop.

asked Nov 28 '18 at 9:07

P.Sharma

I have a SAS master dataset which gets appended every month and have corresponding columns added for those values. Such like below columns

Name Address Column_201809 Column_201810 Column_201811

Can you please suggest how i should handle this schema changes when writing this data back to Hadoop.

hadoop hive sas hdfs

asked Nov 28 '18 at 9:07

P.Sharma

asked Nov 28 '18 at 9:07

P.Sharma

asked Nov 28 '18 at 9:07

P.Sharma

asked Nov 28 '18 at 9:07

P.Sharma

asked Nov 28 '18 at 9:07

P.Sharma

Do you have to keep adding extra columns? This would be much easier to deal with if you had yyyymm as an extra column and appended new rows to your table instead.

– user667489
Nov 28 '18 at 10:13

Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.

– P.Sharma
Nov 28 '18 at 10:15

1

In that case you would just need to change the constraint so that each combination of date + id is unique.

– user667489
Nov 28 '18 at 10:40

That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.

– P.Sharma
Nov 28 '18 at 10:53

That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?

– Richard
Nov 28 '18 at 13:03

|
show 1 more comment

Do you have to keep adding extra columns? This would be much easier to deal with if you had yyyymm as an extra column and appended new rows to your table instead.

– user667489
Nov 28 '18 at 10:13

Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.

– P.Sharma
Nov 28 '18 at 10:15

1

In that case you would just need to change the constraint so that each combination of date + id is unique.

– user667489
Nov 28 '18 at 10:40

That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.

– P.Sharma
Nov 28 '18 at 10:53

That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?

– Richard
Nov 28 '18 at 13:03

Do you have to keep adding extra columns? This would be much easier to deal with if you had yyyymm as an extra column and appended new rows to your table instead.

– user667489
Nov 28 '18 at 10:13

Unfortunately i cannot do that as my master data should have unique rows only and inserting rows for each month would duplicate the data.

– P.Sharma
Nov 28 '18 at 10:15

In that case you would just need to change the constraint so that each combination of date + id is unique.

– user667489
Nov 28 '18 at 10:40

That could be one of the cases.what I am looking for is something as Schema evolution so as to incorporate such changes at schema level on Hadoop side efficiently.

– P.Sharma
Nov 28 '18 at 10:53

That is simply a difficult, denormalized, table design. You are placing date`i nformation in the table schema itself instead of in rows. The *date columns seem to be detail information and might not be considered master level records. Is it possible you are looking for a slowly changing dimension (SCD) solution ?

– Richard
Nov 28 '18 at 13:03

|
show 1 more comment

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53515742%2fcreating-hive-schema-for-frequently-changing-metadata-structure-for-sas-table%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl