Question about OpenMP sections and critical

I am trying to make a fast parallel loop. In each iteration of the loop, I build an array which is costly so I want it distributed over many threads. After the array is built, I use it to update a matrix. Here it gets tricky because the matrix is common to all threads so only 1 thread can modify parts of the matrix at one time, but when I work on the matrix, it turns out I can distribute that work too since I can work on different parts of the matrix at the same time.

Here is what I currently am doing:

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  #pragma omp critical

  {

    update_matrix(A, bi)

  }

}



...



subroutine update_matrix(A, b)

{

  printf("id0 = %dn", omp_get_thread_num());

  #pragma omp parallel sections

  {

    #pragma omp section

    {

      printf("id1 = %dn", omp_get_thread_num());

      modify columns 1 to j of A using b

    }



    #pragma omp section

    {

      printf("id2 = %dn", omp_get_thread_num());

      modify columns j+1 to k of A using b

    }

  }

}

The problem is that the two different sections of the update_matrix() routine are not being parallelized. The output I get looks like this:

id0 = 19

id1 = 0

id2 = 0

id0 = 5

id1 = 0

id2 = 0

...

So the two sections are being executed by the same thread (0). I tried removing the #pragma omp critical in the main loop but it gives the same result. Does anyone know what I'm doing wrong?

edited Nov 29 '18 at 1:38

asked Nov 28 '18 at 19:49

vibe

235

1

I'm not sure it'll be of any use in term of performance, but if you want to do nested parallelism (which is what you're trying to achieve here), you need to enable it explicitly. That can be done with the environment variable OMP_NESTED to be set to 'true', or with the function omp_set_nested() inside the code

– Gilles
Nov 29 '18 at 6:50

add a comment |

Here is what I currently am doing:

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  #pragma omp critical

  {

    update_matrix(A, bi)

  }

}



...



subroutine update_matrix(A, b)

{

  printf("id0 = %dn", omp_get_thread_num());

  #pragma omp parallel sections

  {

    #pragma omp section

    {

      printf("id1 = %dn", omp_get_thread_num());

      modify columns 1 to j of A using b

    }



    #pragma omp section

    {

      printf("id2 = %dn", omp_get_thread_num());

      modify columns j+1 to k of A using b

    }

  }

}

The problem is that the two different sections of the update_matrix() routine are not being parallelized. The output I get looks like this:

id0 = 19

id1 = 0

id2 = 0

id0 = 5

id1 = 0

id2 = 0

...

So the two sections are being executed by the same thread (0). I tried removing the #pragma omp critical in the main loop but it gives the same result. Does anyone know what I'm doing wrong?

edited Nov 29 '18 at 1:38

asked Nov 28 '18 at 19:49

vibe

235

1

I'm not sure it'll be of any use in term of performance, but if you want to do nested parallelism (which is what you're trying to achieve here), you need to enable it explicitly. That can be done with the environment variable OMP_NESTED to be set to 'true', or with the function omp_set_nested() inside the code

– Gilles
Nov 29 '18 at 6:50

add a comment |

Here is what I currently am doing:

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  #pragma omp critical

  {

    update_matrix(A, bi)

  }

}



...



subroutine update_matrix(A, b)

{

  printf("id0 = %dn", omp_get_thread_num());

  #pragma omp parallel sections

  {

    #pragma omp section

    {

      printf("id1 = %dn", omp_get_thread_num());

      modify columns 1 to j of A using b

    }



    #pragma omp section

    {

      printf("id2 = %dn", omp_get_thread_num());

      modify columns j+1 to k of A using b

    }

  }

}

The problem is that the two different sections of the update_matrix() routine are not being parallelized. The output I get looks like this:

id0 = 19

id1 = 0

id2 = 0

id0 = 5

id1 = 0

id2 = 0

...

So the two sections are being executed by the same thread (0). I tried removing the #pragma omp critical in the main loop but it gives the same result. Does anyone know what I'm doing wrong?

edited Nov 29 '18 at 1:38

asked Nov 28 '18 at 19:49

vibe

235

Here is what I currently am doing:

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  #pragma omp critical

  {

    update_matrix(A, bi)

  }

}



...



subroutine update_matrix(A, b)

{

  printf("id0 = %dn", omp_get_thread_num());

  #pragma omp parallel sections

  {

    #pragma omp section

    {

      printf("id1 = %dn", omp_get_thread_num());

      modify columns 1 to j of A using b

    }



    #pragma omp section

    {

      printf("id2 = %dn", omp_get_thread_num());

      modify columns j+1 to k of A using b

    }

  }

}

The problem is that the two different sections of the update_matrix() routine are not being parallelized. The output I get looks like this:

id0 = 19

id1 = 0

id2 = 0

id0 = 5

id1 = 0

id2 = 0

...

So the two sections are being executed by the same thread (0). I tried removing the #pragma omp critical in the main loop but it gives the same result. Does anyone know what I'm doing wrong?

parallel-processing openmp

edited Nov 29 '18 at 1:38

asked Nov 28 '18 at 19:49

vibe

235

edited Nov 29 '18 at 1:38

asked Nov 28 '18 at 19:49

vibe

235

edited Nov 29 '18 at 1:38

asked Nov 28 '18 at 19:49

vibe

235

asked Nov 28 '18 at 19:49

vibe

235

asked Nov 28 '18 at 19:49

vibe

235

1

I'm not sure it'll be of any use in term of performance, but if you want to do nested parallelism (which is what you're trying to achieve here), you need to enable it explicitly. That can be done with the environment variable OMP_NESTED to be set to 'true', or with the function omp_set_nested() inside the code

– Gilles
Nov 29 '18 at 6:50

add a comment |

1

I'm not sure it'll be of any use in term of performance, but if you want to do nested parallelism (which is what you're trying to achieve here), you need to enable it explicitly. That can be done with the environment variable OMP_NESTED to be set to 'true', or with the function omp_set_nested() inside the code

– Gilles
Nov 29 '18 at 6:50

I'm not sure it'll be of any use in term of performance, but if you want to do nested parallelism (which is what you're trying to achieve here), you need to enable it explicitly. That can be done with the environment variable OMP_NESTED to be set to 'true', or with the function omp_set_nested() inside the code

– Gilles
Nov 29 '18 at 6:50

add a comment |

1 Answer
1

active

oldest

votes

#pragma omp parallel sections should not work there because you are already in a parallel part of the code distributed by the #pragma omp prallel for clause. Unless you have enabled nested parallelization with omp_set_nested(1);, the parallel sections clause will be ignored.

Please not that it is not necessarily efficient as spawning new threads has an overhead cost which may not be worth if the update_matrix part is not too CPU intensive.

You have several options:

Forget about that. If the non-critical part of the loop is really what takes most calculations and you already have as many threads as CPUs, spwaning extra threads for a simple operations will do no good. Just remove the parallel sections clause in the subroutine.

Try enable nesting with omp_set_nested(1);

Another option, which comes at the cost of a double synchronization overhead and would be use named critical sections. There may be only one thread in critical section ONE_TO_J and one on critical section J_TO_K so basically up to two threads may update the matrix in parallel. This is costly in term of synchronization overhead.

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  update_matrix(A, bi); // not critical

}



...



subroutine update_matrix(A, b)

{

  printf("id0 = %dn", omp_get_thread_num());

    #pragma omp critical(ONE_TO_J)

    {

      printf("id1 = %dn", omp_get_thread_num());

      modify columns 1 to j of A using b

    }



    #pragma omp critical(J_TO_K)

    {

      printf("id2 = %dn", omp_get_thread_num());

      modify columns j+1 to k of A using b

    }

}

Or use atomic operations to edit the matrix, if this is suitable.

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  update_matrix(A, bi); // not critical

}



...



subroutine update_matrix(A, b)

{

    float tmp;

    printf("id0 = %dn", omp_get_thread_num());

    for (int row=0; row<max_row;row++)

        for (int column=0;column<k;column++){

            float(tmp)=some_function(b,row,column);

            #pragma omp atomic

            A[column][row]+=tmp;

            }



}

By the way, data is stored in row major order in C, so you should be updating the matrix row by row rather than column by column. This will prevent false-sharing and will improve the algorithm memory-access performance.

answered Nov 29 '18 at 9:12

Brice

1,415110

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53527052%2fquestion-about-openmp-sections-and-critical%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Please not that it is not necessarily efficient as spawning new threads has an overhead cost which may not be worth if the update_matrix part is not too CPU intensive.

You have several options:

Forget about that. If the non-critical part of the loop is really what takes most calculations and you already have as many threads as CPUs, spwaning extra threads for a simple operations will do no good. Just remove the parallel sections clause in the subroutine.

Try enable nesting with omp_set_nested(1);

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  update_matrix(A, bi); // not critical

}



...



subroutine update_matrix(A, b)

{

  printf("id0 = %dn", omp_get_thread_num());

    #pragma omp critical(ONE_TO_J)

    {

      printf("id1 = %dn", omp_get_thread_num());

      modify columns 1 to j of A using b

    }



    #pragma omp critical(J_TO_K)

    {

      printf("id2 = %dn", omp_get_thread_num());

      modify columns j+1 to k of A using b

    }

}

Or use atomic operations to edit the matrix, if this is suitable.

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  update_matrix(A, bi); // not critical

}



...



subroutine update_matrix(A, b)

{

    float tmp;

    printf("id0 = %dn", omp_get_thread_num());

    for (int row=0; row<max_row;row++)

        for (int column=0;column<k;column++){

            float(tmp)=some_function(b,row,column);

            #pragma omp atomic

            A[column][row]+=tmp;

            }



}

answered Nov 29 '18 at 9:12

Brice

1,415110

add a comment |

Please not that it is not necessarily efficient as spawning new threads has an overhead cost which may not be worth if the update_matrix part is not too CPU intensive.

You have several options:

Forget about that. If the non-critical part of the loop is really what takes most calculations and you already have as many threads as CPUs, spwaning extra threads for a simple operations will do no good. Just remove the parallel sections clause in the subroutine.

Try enable nesting with omp_set_nested(1);

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  update_matrix(A, bi); // not critical

}



...



subroutine update_matrix(A, b)

{

  printf("id0 = %dn", omp_get_thread_num());

    #pragma omp critical(ONE_TO_J)

    {

      printf("id1 = %dn", omp_get_thread_num());

      modify columns 1 to j of A using b

    }



    #pragma omp critical(J_TO_K)

    {

      printf("id2 = %dn", omp_get_thread_num());

      modify columns j+1 to k of A using b

    }

}

Or use atomic operations to edit the matrix, if this is suitable.

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  update_matrix(A, bi); // not critical

}



...



subroutine update_matrix(A, b)

{

    float tmp;

    printf("id0 = %dn", omp_get_thread_num());

    for (int row=0; row<max_row;row++)

        for (int column=0;column<k;column++){

            float(tmp)=some_function(b,row,column);

            #pragma omp atomic

            A[column][row]+=tmp;

            }



}

answered Nov 29 '18 at 9:12

Brice

1,415110

add a comment |

Please not that it is not necessarily efficient as spawning new threads has an overhead cost which may not be worth if the update_matrix part is not too CPU intensive.

You have several options:

Forget about that. If the non-critical part of the loop is really what takes most calculations and you already have as many threads as CPUs, spwaning extra threads for a simple operations will do no good. Just remove the parallel sections clause in the subroutine.

Try enable nesting with omp_set_nested(1);

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  update_matrix(A, bi); // not critical

}



...



subroutine update_matrix(A, b)

{

  printf("id0 = %dn", omp_get_thread_num());

    #pragma omp critical(ONE_TO_J)

    {

      printf("id1 = %dn", omp_get_thread_num());

      modify columns 1 to j of A using b

    }



    #pragma omp critical(J_TO_K)

    {

      printf("id2 = %dn", omp_get_thread_num());

      modify columns j+1 to k of A using b

    }

}

Or use atomic operations to edit the matrix, if this is suitable.

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  update_matrix(A, bi); // not critical

}



...



subroutine update_matrix(A, b)

{

    float tmp;

    printf("id0 = %dn", omp_get_thread_num());

    for (int row=0; row<max_row;row++)

        for (int column=0;column<k;column++){

            float(tmp)=some_function(b,row,column);

            #pragma omp atomic

            A[column][row]+=tmp;

            }



}

answered Nov 29 '18 at 9:12

Brice

1,415110

Please not that it is not necessarily efficient as spawning new threads has an overhead cost which may not be worth if the update_matrix part is not too CPU intensive.

You have several options:

Forget about that. If the non-critical part of the loop is really what takes most calculations and you already have as many threads as CPUs, spwaning extra threads for a simple operations will do no good. Just remove the parallel sections clause in the subroutine.

Try enable nesting with omp_set_nested(1);

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  update_matrix(A, bi); // not critical

}



...



subroutine update_matrix(A, b)

{

  printf("id0 = %dn", omp_get_thread_num());

    #pragma omp critical(ONE_TO_J)

    {

      printf("id1 = %dn", omp_get_thread_num());

      modify columns 1 to j of A using b

    }



    #pragma omp critical(J_TO_K)

    {

      printf("id2 = %dn", omp_get_thread_num());

      modify columns j+1 to k of A using b

    }

}

Or use atomic operations to edit the matrix, if this is suitable.

#pragma omp parallel for

for (i = 0; i < n; ++i)

{

  ... build array bi ...

  update_matrix(A, bi); // not critical

}



...



subroutine update_matrix(A, b)

{

    float tmp;

    printf("id0 = %dn", omp_get_thread_num());

    for (int row=0; row<max_row;row++)

        for (int column=0;column<k;column++){

            float(tmp)=some_function(b,row,column);

            #pragma omp atomic

            A[column][row]+=tmp;

            }



}

answered Nov 29 '18 at 9:12

Brice

1,415110

answered Nov 29 '18 at 9:12

Brice

1,415110

answered Nov 29 '18 at 9:12

Brice

1,415110

answered Nov 29 '18 at 9:12

Brice

1,415110

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl