Dataflow unable to GET a reference to BigQuery tables when too many cores or more than one machine

My streaming Dataflow pipeline is supposed to read analytics hits from Pub/Sub and write them to BigQuery. If I use too many machines, or they're too big, it throws rate limit errors when getting a reference to the tables, more precisely when executing _get_or_create_table.

The rate limit that is reached seems to be one of these: 100 API requests per second per user, 300 concurrent API requests per user.

It is not blocking the pipeline (rows get written after some point), but I have the feeling it blocks some of the threads and prevents me from fully taking advantage of the parallelization. Switching from one machine with 4 CPUs to 5 with each 8 CPUs brought no improvement to the latency (in fact it got worse).

How can I avoid this error, and have a high number of machines write to BQ ?

Here's the log from the Dataflow monitoring interface. It is appearing regularly when I launch the pipeline:

...

File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 1087, in get_or_create_table

    found_table = self._get_table(project_id, dataset_id, table_id)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 184, in wrapper

    return fun(*args, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 925, in _get_table

    response = self.client.tables.Get(request)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py", line 611, in Get

    config, request, global_params=global_params)

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 722, in _RunMethod

    return self.ProcessHttpResponse(method_config, http_response, request)

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 728, in ProcessHttpResponse

    self.__ProcessHttpResponse(method_config, http_response, request))

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 599, in __ProcessHttpResponse

    http_response, method_config=method_config, request=request)



HttpForbiddenError: HttpError accessing <https://www.googleapis.com/bigquery/v2/projects/<project_id>/datasets/<dataset_id>/tables/<table_id>?alt=json>: response: <{'status': '403', 'content-length': '577', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'expires': 'Sun, 25 Nov 2018 14:36:24 GMT', 'vary': 'Origin, X-Origin', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control': 'private, max-age=0', 'date': 'Sun, 25 Nov 2018 14:36:24 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{

 "error": {

  "errors": [

   {

    "domain": "global",

    "reason": "rateLimitExceeded",

    "message": "Exceeded rate limits: Your user_method exceeded quota for api requests per user per method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors",

    "locationType": "other",

    "location": "helix_api.method_request"

   }

  ],

  "code": 403,

  "message": "Exceeded rate limits: Your user_method exceeded quota for api requests per user per method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"

Here's the code of the pipeline. I cut pretty much everything in it to see if this still happens:

p = beam.Pipeline(options=options)



msgs = p | 'Read' >> beam.io.gcp.pubsub.ReadFromPubSub(

    topic='projects/{project}/topics/{topic}'.format(

        project=args.project, topic=args.hits_topic),

    id_label='hit_id',

    timestamp_attribute='time')



lines = msgs | beam.Map(lambda x: {'content': x})



(lines

    | 'WriteToBQ' >> beam.io.gcp.bigquery.WriteToBigQuery(args.table,

                                                          dataset=args.dataset,

                                                          project=args.project))

edited Dec 4 '18 at 9:59

asked Nov 25 '18 at 15:15

T. Pilewicz

169214

Can you share your Dataflow code?

– Graham Polley
Nov 26 '18 at 1:26

Sure, just added this

– T. Pilewicz
Nov 26 '18 at 9:10

add a comment |

The rate limit that is reached seems to be one of these: 100 API requests per second per user, 300 concurrent API requests per user.

How can I avoid this error, and have a high number of machines write to BQ ?

Here's the log from the Dataflow monitoring interface. It is appearing regularly when I launch the pipeline:

...

File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 1087, in get_or_create_table

    found_table = self._get_table(project_id, dataset_id, table_id)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 184, in wrapper

    return fun(*args, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 925, in _get_table

    response = self.client.tables.Get(request)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py", line 611, in Get

    config, request, global_params=global_params)

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 722, in _RunMethod

    return self.ProcessHttpResponse(method_config, http_response, request)

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 728, in ProcessHttpResponse

    self.__ProcessHttpResponse(method_config, http_response, request))

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 599, in __ProcessHttpResponse

    http_response, method_config=method_config, request=request)



HttpForbiddenError: HttpError accessing <https://www.googleapis.com/bigquery/v2/projects/<project_id>/datasets/<dataset_id>/tables/<table_id>?alt=json>: response: <{'status': '403', 'content-length': '577', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'expires': 'Sun, 25 Nov 2018 14:36:24 GMT', 'vary': 'Origin, X-Origin', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control': 'private, max-age=0', 'date': 'Sun, 25 Nov 2018 14:36:24 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{

 "error": {

  "errors": [

   {

    "domain": "global",

    "reason": "rateLimitExceeded",

    "message": "Exceeded rate limits: Your user_method exceeded quota for api requests per user per method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors",

    "locationType": "other",

    "location": "helix_api.method_request"

   }

  ],

  "code": 403,

  "message": "Exceeded rate limits: Your user_method exceeded quota for api requests per user per method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"

Here's the code of the pipeline. I cut pretty much everything in it to see if this still happens:

p = beam.Pipeline(options=options)



msgs = p | 'Read' >> beam.io.gcp.pubsub.ReadFromPubSub(

    topic='projects/{project}/topics/{topic}'.format(

        project=args.project, topic=args.hits_topic),

    id_label='hit_id',

    timestamp_attribute='time')



lines = msgs | beam.Map(lambda x: {'content': x})



(lines

    | 'WriteToBQ' >> beam.io.gcp.bigquery.WriteToBigQuery(args.table,

                                                          dataset=args.dataset,

                                                          project=args.project))

edited Dec 4 '18 at 9:59

asked Nov 25 '18 at 15:15

T. Pilewicz

169214

Can you share your Dataflow code?

– Graham Polley
Nov 26 '18 at 1:26

Sure, just added this

– T. Pilewicz
Nov 26 '18 at 9:10

add a comment |

The rate limit that is reached seems to be one of these: 100 API requests per second per user, 300 concurrent API requests per user.

How can I avoid this error, and have a high number of machines write to BQ ?

Here's the log from the Dataflow monitoring interface. It is appearing regularly when I launch the pipeline:

...

File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 1087, in get_or_create_table

    found_table = self._get_table(project_id, dataset_id, table_id)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 184, in wrapper

    return fun(*args, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 925, in _get_table

    response = self.client.tables.Get(request)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py", line 611, in Get

    config, request, global_params=global_params)

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 722, in _RunMethod

    return self.ProcessHttpResponse(method_config, http_response, request)

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 728, in ProcessHttpResponse

    self.__ProcessHttpResponse(method_config, http_response, request))

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 599, in __ProcessHttpResponse

    http_response, method_config=method_config, request=request)



HttpForbiddenError: HttpError accessing <https://www.googleapis.com/bigquery/v2/projects/<project_id>/datasets/<dataset_id>/tables/<table_id>?alt=json>: response: <{'status': '403', 'content-length': '577', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'expires': 'Sun, 25 Nov 2018 14:36:24 GMT', 'vary': 'Origin, X-Origin', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control': 'private, max-age=0', 'date': 'Sun, 25 Nov 2018 14:36:24 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{

 "error": {

  "errors": [

   {

    "domain": "global",

    "reason": "rateLimitExceeded",

    "message": "Exceeded rate limits: Your user_method exceeded quota for api requests per user per method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors",

    "locationType": "other",

    "location": "helix_api.method_request"

   }

  ],

  "code": 403,

  "message": "Exceeded rate limits: Your user_method exceeded quota for api requests per user per method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"

Here's the code of the pipeline. I cut pretty much everything in it to see if this still happens:

p = beam.Pipeline(options=options)



msgs = p | 'Read' >> beam.io.gcp.pubsub.ReadFromPubSub(

    topic='projects/{project}/topics/{topic}'.format(

        project=args.project, topic=args.hits_topic),

    id_label='hit_id',

    timestamp_attribute='time')



lines = msgs | beam.Map(lambda x: {'content': x})



(lines

    | 'WriteToBQ' >> beam.io.gcp.bigquery.WriteToBigQuery(args.table,

                                                          dataset=args.dataset,

                                                          project=args.project))

edited Dec 4 '18 at 9:59

asked Nov 25 '18 at 15:15

T. Pilewicz

169214

The rate limit that is reached seems to be one of these: 100 API requests per second per user, 300 concurrent API requests per user.

How can I avoid this error, and have a high number of machines write to BQ ?

Here's the log from the Dataflow monitoring interface. It is appearing regularly when I launch the pipeline:

...

File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 1087, in get_or_create_table

    found_table = self._get_table(project_id, dataset_id, table_id)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 184, in wrapper

    return fun(*args, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 925, in _get_table

    response = self.client.tables.Get(request)

  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py", line 611, in Get

    config, request, global_params=global_params)

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 722, in _RunMethod

    return self.ProcessHttpResponse(method_config, http_response, request)

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 728, in ProcessHttpResponse

    self.__ProcessHttpResponse(method_config, http_response, request))

  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 599, in __ProcessHttpResponse

    http_response, method_config=method_config, request=request)



HttpForbiddenError: HttpError accessing <https://www.googleapis.com/bigquery/v2/projects/<project_id>/datasets/<dataset_id>/tables/<table_id>?alt=json>: response: <{'status': '403', 'content-length': '577', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'expires': 'Sun, 25 Nov 2018 14:36:24 GMT', 'vary': 'Origin, X-Origin', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control': 'private, max-age=0', 'date': 'Sun, 25 Nov 2018 14:36:24 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{

 "error": {

  "errors": [

   {

    "domain": "global",

    "reason": "rateLimitExceeded",

    "message": "Exceeded rate limits: Your user_method exceeded quota for api requests per user per method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors",

    "locationType": "other",

    "location": "helix_api.method_request"

   }

  ],

  "code": 403,

  "message": "Exceeded rate limits: Your user_method exceeded quota for api requests per user per method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"

Here's the code of the pipeline. I cut pretty much everything in it to see if this still happens:

p = beam.Pipeline(options=options)



msgs = p | 'Read' >> beam.io.gcp.pubsub.ReadFromPubSub(

    topic='projects/{project}/topics/{topic}'.format(

        project=args.project, topic=args.hits_topic),

    id_label='hit_id',

    timestamp_attribute='time')



lines = msgs | beam.Map(lambda x: {'content': x})



(lines

    | 'WriteToBQ' >> beam.io.gcp.bigquery.WriteToBigQuery(args.table,

                                                          dataset=args.dataset,

                                                          project=args.project))

python google-bigquery google-cloud-dataflow apache-beam

edited Dec 4 '18 at 9:59

asked Nov 25 '18 at 15:15

T. Pilewicz

169214

edited Dec 4 '18 at 9:59

asked Nov 25 '18 at 15:15

T. Pilewicz

169214

edited Dec 4 '18 at 9:59

asked Nov 25 '18 at 15:15

T. Pilewicz

169214

asked Nov 25 '18 at 15:15

T. Pilewicz

169214

asked Nov 25 '18 at 15:15

T. Pilewicz

169214

Can you share your Dataflow code?

– Graham Polley
Nov 26 '18 at 1:26

Sure, just added this

– T. Pilewicz
Nov 26 '18 at 9:10

add a comment |

Can you share your Dataflow code?

– Graham Polley
Nov 26 '18 at 1:26

Sure, just added this

– T. Pilewicz
Nov 26 '18 at 9:10

Can you share your Dataflow code?

– Graham Polley
Nov 26 '18 at 1:26

Sure, just added this

– T. Pilewicz
Nov 26 '18 at 9:10

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468893%2fdataflow-unable-to-get-a-reference-to-bigquery-tables-when-too-many-cores-or-mor%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl