How can I achieve reduction using OpenMP Tasks?












0















I have this OpenMP code that performs a simple reduction:



for(k = 0; k < m; k++) 
{
#pragma omp parallel for private(i) reduction(+:mysum) schedule(static)
for (i = 0; i < m; i++)
{
mysum += a[i][k] * a[i][k];
}
}


I want to create a code equivalent to this one, but using OpenMP Tasks. Here is what I tried by following this article:



for(k = 0; k < m; k++) 
{
#pragma omp parallel reduction(+:mysum)
{
#pragma omp single
{
for (i = 0; i < m; i++)
{
#pragma omp task private(i) shared(k)
{
partialSum += a[i][k] * a[i][k];
}
}
}

#pragma omp taskwait
mysum += partialSum;
}
}


The variable partialSum is declared as threadprivate and it's also a global variable:



int partialSum = 0;
#pragma omp threadprivate(partialSum)


a is a simple array of ints (m x m).



The problem is that when I run the code above (the one with tasks) multiple times, I get different results.



Do you have an idea on what should I change to make this work?



Thank you respectfully










share|improve this question























  • In your second code, partialSum is shared among all your threads. The reduction handles making private copies of mysum and combining them at the end, but the same treatment is not extended to partialSum, which therefore is the subject of a data race. The slide deck you linked uses a threadprivate() directive to address that problem. I'm not certain that would be sufficient for you, but it would at least resolve the data race.

    – John Bollinger
    Nov 24 '18 at 16:08













  • I don't think that partialSum is shared among all threads because I also declare it as threadPrivate, exactly as in that article

    – Cosmin Ioniță
    Nov 25 '18 at 17:17






  • 1





    I guess I overlooked that at the end of the question. Please, present a Minimal, Complete, and Verifiable example exhibiting the problem. Not only will that reduce the likelihood of such misunderstandings, but the additional context may prove important.

    – John Bollinger
    Nov 25 '18 at 17:43











  • What misunderstandings? I stated the fact that partialSum is threadPrivate from the beginning. I think that you should have read the entire question from the beginning.

    – Cosmin Ioniță
    Nov 26 '18 at 8:23
















0















I have this OpenMP code that performs a simple reduction:



for(k = 0; k < m; k++) 
{
#pragma omp parallel for private(i) reduction(+:mysum) schedule(static)
for (i = 0; i < m; i++)
{
mysum += a[i][k] * a[i][k];
}
}


I want to create a code equivalent to this one, but using OpenMP Tasks. Here is what I tried by following this article:



for(k = 0; k < m; k++) 
{
#pragma omp parallel reduction(+:mysum)
{
#pragma omp single
{
for (i = 0; i < m; i++)
{
#pragma omp task private(i) shared(k)
{
partialSum += a[i][k] * a[i][k];
}
}
}

#pragma omp taskwait
mysum += partialSum;
}
}


The variable partialSum is declared as threadprivate and it's also a global variable:



int partialSum = 0;
#pragma omp threadprivate(partialSum)


a is a simple array of ints (m x m).



The problem is that when I run the code above (the one with tasks) multiple times, I get different results.



Do you have an idea on what should I change to make this work?



Thank you respectfully










share|improve this question























  • In your second code, partialSum is shared among all your threads. The reduction handles making private copies of mysum and combining them at the end, but the same treatment is not extended to partialSum, which therefore is the subject of a data race. The slide deck you linked uses a threadprivate() directive to address that problem. I'm not certain that would be sufficient for you, but it would at least resolve the data race.

    – John Bollinger
    Nov 24 '18 at 16:08













  • I don't think that partialSum is shared among all threads because I also declare it as threadPrivate, exactly as in that article

    – Cosmin Ioniță
    Nov 25 '18 at 17:17






  • 1





    I guess I overlooked that at the end of the question. Please, present a Minimal, Complete, and Verifiable example exhibiting the problem. Not only will that reduce the likelihood of such misunderstandings, but the additional context may prove important.

    – John Bollinger
    Nov 25 '18 at 17:43











  • What misunderstandings? I stated the fact that partialSum is threadPrivate from the beginning. I think that you should have read the entire question from the beginning.

    – Cosmin Ioniță
    Nov 26 '18 at 8:23














0












0








0








I have this OpenMP code that performs a simple reduction:



for(k = 0; k < m; k++) 
{
#pragma omp parallel for private(i) reduction(+:mysum) schedule(static)
for (i = 0; i < m; i++)
{
mysum += a[i][k] * a[i][k];
}
}


I want to create a code equivalent to this one, but using OpenMP Tasks. Here is what I tried by following this article:



for(k = 0; k < m; k++) 
{
#pragma omp parallel reduction(+:mysum)
{
#pragma omp single
{
for (i = 0; i < m; i++)
{
#pragma omp task private(i) shared(k)
{
partialSum += a[i][k] * a[i][k];
}
}
}

#pragma omp taskwait
mysum += partialSum;
}
}


The variable partialSum is declared as threadprivate and it's also a global variable:



int partialSum = 0;
#pragma omp threadprivate(partialSum)


a is a simple array of ints (m x m).



The problem is that when I run the code above (the one with tasks) multiple times, I get different results.



Do you have an idea on what should I change to make this work?



Thank you respectfully










share|improve this question














I have this OpenMP code that performs a simple reduction:



for(k = 0; k < m; k++) 
{
#pragma omp parallel for private(i) reduction(+:mysum) schedule(static)
for (i = 0; i < m; i++)
{
mysum += a[i][k] * a[i][k];
}
}


I want to create a code equivalent to this one, but using OpenMP Tasks. Here is what I tried by following this article:



for(k = 0; k < m; k++) 
{
#pragma omp parallel reduction(+:mysum)
{
#pragma omp single
{
for (i = 0; i < m; i++)
{
#pragma omp task private(i) shared(k)
{
partialSum += a[i][k] * a[i][k];
}
}
}

#pragma omp taskwait
mysum += partialSum;
}
}


The variable partialSum is declared as threadprivate and it's also a global variable:



int partialSum = 0;
#pragma omp threadprivate(partialSum)


a is a simple array of ints (m x m).



The problem is that when I run the code above (the one with tasks) multiple times, I get different results.



Do you have an idea on what should I change to make this work?



Thank you respectfully







c parallel-processing openmp reduction






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 24 '18 at 15:49









Cosmin IonițăCosmin Ioniță

567830




567830













  • In your second code, partialSum is shared among all your threads. The reduction handles making private copies of mysum and combining them at the end, but the same treatment is not extended to partialSum, which therefore is the subject of a data race. The slide deck you linked uses a threadprivate() directive to address that problem. I'm not certain that would be sufficient for you, but it would at least resolve the data race.

    – John Bollinger
    Nov 24 '18 at 16:08













  • I don't think that partialSum is shared among all threads because I also declare it as threadPrivate, exactly as in that article

    – Cosmin Ioniță
    Nov 25 '18 at 17:17






  • 1





    I guess I overlooked that at the end of the question. Please, present a Minimal, Complete, and Verifiable example exhibiting the problem. Not only will that reduce the likelihood of such misunderstandings, but the additional context may prove important.

    – John Bollinger
    Nov 25 '18 at 17:43











  • What misunderstandings? I stated the fact that partialSum is threadPrivate from the beginning. I think that you should have read the entire question from the beginning.

    – Cosmin Ioniță
    Nov 26 '18 at 8:23



















  • In your second code, partialSum is shared among all your threads. The reduction handles making private copies of mysum and combining them at the end, but the same treatment is not extended to partialSum, which therefore is the subject of a data race. The slide deck you linked uses a threadprivate() directive to address that problem. I'm not certain that would be sufficient for you, but it would at least resolve the data race.

    – John Bollinger
    Nov 24 '18 at 16:08













  • I don't think that partialSum is shared among all threads because I also declare it as threadPrivate, exactly as in that article

    – Cosmin Ioniță
    Nov 25 '18 at 17:17






  • 1





    I guess I overlooked that at the end of the question. Please, present a Minimal, Complete, and Verifiable example exhibiting the problem. Not only will that reduce the likelihood of such misunderstandings, but the additional context may prove important.

    – John Bollinger
    Nov 25 '18 at 17:43











  • What misunderstandings? I stated the fact that partialSum is threadPrivate from the beginning. I think that you should have read the entire question from the beginning.

    – Cosmin Ioniță
    Nov 26 '18 at 8:23

















In your second code, partialSum is shared among all your threads. The reduction handles making private copies of mysum and combining them at the end, but the same treatment is not extended to partialSum, which therefore is the subject of a data race. The slide deck you linked uses a threadprivate() directive to address that problem. I'm not certain that would be sufficient for you, but it would at least resolve the data race.

– John Bollinger
Nov 24 '18 at 16:08







In your second code, partialSum is shared among all your threads. The reduction handles making private copies of mysum and combining them at the end, but the same treatment is not extended to partialSum, which therefore is the subject of a data race. The slide deck you linked uses a threadprivate() directive to address that problem. I'm not certain that would be sufficient for you, but it would at least resolve the data race.

– John Bollinger
Nov 24 '18 at 16:08















I don't think that partialSum is shared among all threads because I also declare it as threadPrivate, exactly as in that article

– Cosmin Ioniță
Nov 25 '18 at 17:17





I don't think that partialSum is shared among all threads because I also declare it as threadPrivate, exactly as in that article

– Cosmin Ioniță
Nov 25 '18 at 17:17




1




1





I guess I overlooked that at the end of the question. Please, present a Minimal, Complete, and Verifiable example exhibiting the problem. Not only will that reduce the likelihood of such misunderstandings, but the additional context may prove important.

– John Bollinger
Nov 25 '18 at 17:43





I guess I overlooked that at the end of the question. Please, present a Minimal, Complete, and Verifiable example exhibiting the problem. Not only will that reduce the likelihood of such misunderstandings, but the additional context may prove important.

– John Bollinger
Nov 25 '18 at 17:43













What misunderstandings? I stated the fact that partialSum is threadPrivate from the beginning. I think that you should have read the entire question from the beginning.

– Cosmin Ioniță
Nov 26 '18 at 8:23





What misunderstandings? I stated the fact that partialSum is threadPrivate from the beginning. I think that you should have read the entire question from the beginning.

– Cosmin Ioniță
Nov 26 '18 at 8:23












1 Answer
1






active

oldest

votes


















1














private variables are uninitialized (at least not initialized by their outside value). i should be firstprivate.



If you just get rid of private(i) shared(k) everything is correct by default. k comes from outside of the parallel section and thus is implicitly shared in the parallel section. This also makes it implicitly shared in the task generating construct. Right now i is also shared/shared. If you define it locally instead, (for (int i...), it becomes implicitly private to the parallel section and thus implicitly firstprivate in the task generating construct.



You should also add



#pragma omp atomic
mysum += partialSum;


On the other hand, you don't necessarily need the taskwait (see this answer)



Note that the talk uses firstprivate correctly.






share|improve this answer


























  • You say about k that is shared in the task generating construct but then you say that it's firstprivate. It's a bit unclear to me what's happening with k

    – Cosmin Ioniță
    Nov 26 '18 at 21:05













  • Sorry there was a typo and I misread your code regarding i. The first sentence (implicitly shared/shared applies to k). You should define i locally then it becomes private/firstprivate implicitly which is what you want. You may also specify the data-sharing attributes manually (just with firstprivate(i) instead.

    – Zulan
    Nov 27 '18 at 8:02











  • Understood. Thanks a lot for your answer!

    – Cosmin Ioniță
    Nov 27 '18 at 14:08











  • @CosminIoniță please note that the recently released OpenMP 5.0 natively supports task_reduction, but it is a bit different and I have no experience with it yet. AFAIK it is only supported in the Intel 18.0 compiler yet.

    – Zulan
    Nov 28 '18 at 11:54











  • Great to know! Thanks, @Zulan!

    – Cosmin Ioniță
    Nov 28 '18 at 17:18











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53459821%2fhow-can-i-achieve-reduction-using-openmp-tasks%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














private variables are uninitialized (at least not initialized by their outside value). i should be firstprivate.



If you just get rid of private(i) shared(k) everything is correct by default. k comes from outside of the parallel section and thus is implicitly shared in the parallel section. This also makes it implicitly shared in the task generating construct. Right now i is also shared/shared. If you define it locally instead, (for (int i...), it becomes implicitly private to the parallel section and thus implicitly firstprivate in the task generating construct.



You should also add



#pragma omp atomic
mysum += partialSum;


On the other hand, you don't necessarily need the taskwait (see this answer)



Note that the talk uses firstprivate correctly.






share|improve this answer


























  • You say about k that is shared in the task generating construct but then you say that it's firstprivate. It's a bit unclear to me what's happening with k

    – Cosmin Ioniță
    Nov 26 '18 at 21:05













  • Sorry there was a typo and I misread your code regarding i. The first sentence (implicitly shared/shared applies to k). You should define i locally then it becomes private/firstprivate implicitly which is what you want. You may also specify the data-sharing attributes manually (just with firstprivate(i) instead.

    – Zulan
    Nov 27 '18 at 8:02











  • Understood. Thanks a lot for your answer!

    – Cosmin Ioniță
    Nov 27 '18 at 14:08











  • @CosminIoniță please note that the recently released OpenMP 5.0 natively supports task_reduction, but it is a bit different and I have no experience with it yet. AFAIK it is only supported in the Intel 18.0 compiler yet.

    – Zulan
    Nov 28 '18 at 11:54











  • Great to know! Thanks, @Zulan!

    – Cosmin Ioniță
    Nov 28 '18 at 17:18
















1














private variables are uninitialized (at least not initialized by their outside value). i should be firstprivate.



If you just get rid of private(i) shared(k) everything is correct by default. k comes from outside of the parallel section and thus is implicitly shared in the parallel section. This also makes it implicitly shared in the task generating construct. Right now i is also shared/shared. If you define it locally instead, (for (int i...), it becomes implicitly private to the parallel section and thus implicitly firstprivate in the task generating construct.



You should also add



#pragma omp atomic
mysum += partialSum;


On the other hand, you don't necessarily need the taskwait (see this answer)



Note that the talk uses firstprivate correctly.






share|improve this answer


























  • You say about k that is shared in the task generating construct but then you say that it's firstprivate. It's a bit unclear to me what's happening with k

    – Cosmin Ioniță
    Nov 26 '18 at 21:05













  • Sorry there was a typo and I misread your code regarding i. The first sentence (implicitly shared/shared applies to k). You should define i locally then it becomes private/firstprivate implicitly which is what you want. You may also specify the data-sharing attributes manually (just with firstprivate(i) instead.

    – Zulan
    Nov 27 '18 at 8:02











  • Understood. Thanks a lot for your answer!

    – Cosmin Ioniță
    Nov 27 '18 at 14:08











  • @CosminIoniță please note that the recently released OpenMP 5.0 natively supports task_reduction, but it is a bit different and I have no experience with it yet. AFAIK it is only supported in the Intel 18.0 compiler yet.

    – Zulan
    Nov 28 '18 at 11:54











  • Great to know! Thanks, @Zulan!

    – Cosmin Ioniță
    Nov 28 '18 at 17:18














1












1








1







private variables are uninitialized (at least not initialized by their outside value). i should be firstprivate.



If you just get rid of private(i) shared(k) everything is correct by default. k comes from outside of the parallel section and thus is implicitly shared in the parallel section. This also makes it implicitly shared in the task generating construct. Right now i is also shared/shared. If you define it locally instead, (for (int i...), it becomes implicitly private to the parallel section and thus implicitly firstprivate in the task generating construct.



You should also add



#pragma omp atomic
mysum += partialSum;


On the other hand, you don't necessarily need the taskwait (see this answer)



Note that the talk uses firstprivate correctly.






share|improve this answer















private variables are uninitialized (at least not initialized by their outside value). i should be firstprivate.



If you just get rid of private(i) shared(k) everything is correct by default. k comes from outside of the parallel section and thus is implicitly shared in the parallel section. This also makes it implicitly shared in the task generating construct. Right now i is also shared/shared. If you define it locally instead, (for (int i...), it becomes implicitly private to the parallel section and thus implicitly firstprivate in the task generating construct.



You should also add



#pragma omp atomic
mysum += partialSum;


On the other hand, you don't necessarily need the taskwait (see this answer)



Note that the talk uses firstprivate correctly.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 27 '18 at 8:00

























answered Nov 26 '18 at 12:27









ZulanZulan

15.3k63070




15.3k63070













  • You say about k that is shared in the task generating construct but then you say that it's firstprivate. It's a bit unclear to me what's happening with k

    – Cosmin Ioniță
    Nov 26 '18 at 21:05













  • Sorry there was a typo and I misread your code regarding i. The first sentence (implicitly shared/shared applies to k). You should define i locally then it becomes private/firstprivate implicitly which is what you want. You may also specify the data-sharing attributes manually (just with firstprivate(i) instead.

    – Zulan
    Nov 27 '18 at 8:02











  • Understood. Thanks a lot for your answer!

    – Cosmin Ioniță
    Nov 27 '18 at 14:08











  • @CosminIoniță please note that the recently released OpenMP 5.0 natively supports task_reduction, but it is a bit different and I have no experience with it yet. AFAIK it is only supported in the Intel 18.0 compiler yet.

    – Zulan
    Nov 28 '18 at 11:54











  • Great to know! Thanks, @Zulan!

    – Cosmin Ioniță
    Nov 28 '18 at 17:18



















  • You say about k that is shared in the task generating construct but then you say that it's firstprivate. It's a bit unclear to me what's happening with k

    – Cosmin Ioniță
    Nov 26 '18 at 21:05













  • Sorry there was a typo and I misread your code regarding i. The first sentence (implicitly shared/shared applies to k). You should define i locally then it becomes private/firstprivate implicitly which is what you want. You may also specify the data-sharing attributes manually (just with firstprivate(i) instead.

    – Zulan
    Nov 27 '18 at 8:02











  • Understood. Thanks a lot for your answer!

    – Cosmin Ioniță
    Nov 27 '18 at 14:08











  • @CosminIoniță please note that the recently released OpenMP 5.0 natively supports task_reduction, but it is a bit different and I have no experience with it yet. AFAIK it is only supported in the Intel 18.0 compiler yet.

    – Zulan
    Nov 28 '18 at 11:54











  • Great to know! Thanks, @Zulan!

    – Cosmin Ioniță
    Nov 28 '18 at 17:18

















You say about k that is shared in the task generating construct but then you say that it's firstprivate. It's a bit unclear to me what's happening with k

– Cosmin Ioniță
Nov 26 '18 at 21:05







You say about k that is shared in the task generating construct but then you say that it's firstprivate. It's a bit unclear to me what's happening with k

– Cosmin Ioniță
Nov 26 '18 at 21:05















Sorry there was a typo and I misread your code regarding i. The first sentence (implicitly shared/shared applies to k). You should define i locally then it becomes private/firstprivate implicitly which is what you want. You may also specify the data-sharing attributes manually (just with firstprivate(i) instead.

– Zulan
Nov 27 '18 at 8:02





Sorry there was a typo and I misread your code regarding i. The first sentence (implicitly shared/shared applies to k). You should define i locally then it becomes private/firstprivate implicitly which is what you want. You may also specify the data-sharing attributes manually (just with firstprivate(i) instead.

– Zulan
Nov 27 '18 at 8:02













Understood. Thanks a lot for your answer!

– Cosmin Ioniță
Nov 27 '18 at 14:08





Understood. Thanks a lot for your answer!

– Cosmin Ioniță
Nov 27 '18 at 14:08













@CosminIoniță please note that the recently released OpenMP 5.0 natively supports task_reduction, but it is a bit different and I have no experience with it yet. AFAIK it is only supported in the Intel 18.0 compiler yet.

– Zulan
Nov 28 '18 at 11:54





@CosminIoniță please note that the recently released OpenMP 5.0 natively supports task_reduction, but it is a bit different and I have no experience with it yet. AFAIK it is only supported in the Intel 18.0 compiler yet.

– Zulan
Nov 28 '18 at 11:54













Great to know! Thanks, @Zulan!

– Cosmin Ioniță
Nov 28 '18 at 17:18





Great to know! Thanks, @Zulan!

– Cosmin Ioniță
Nov 28 '18 at 17:18


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53459821%2fhow-can-i-achieve-reduction-using-openmp-tasks%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Futebolista

Lallio

Jornalista