Reading XBRL facts - Java











up vote
1
down vote

favorite












I need to get a few facts from SEC 10-K filings for e.g. gross revenue, gross profit, gross margin, operating expenses etc. along with the corresponding context.



For filings like https://www.sec.gov/Archives/edgar/data/1318605/000156459018002956/tsla-20171231.xml , it seems feasible to just use XPath to find out the few required elements and the values.
But there are filings like (https://www.sec.gov/Archives/edgar/data/19617/000001961718000057/jpm-20171231.xml) where total expense is broken up in different segments with an extension taxonomy.



My question is




  1. What would be a reliable way to work with files like these? Say, if I just want the total operational expenditure. Is there a reliable way to find what are the elements I'll need to read and then may be sum up?

  2. I've tried using the UBMatrix library for reading xbrl files. It works on some files (non-SEC, can read node values) but for SEC 10-K filings throws NPE. Could there be a particular reason why xbrls instance documents from SEC is failing? (haven't checked library code though)


In any case, if doing it simply with XPath is possible I'd prefer that.
Validity of the xbrl document is not important.










share|improve this question


























    up vote
    1
    down vote

    favorite












    I need to get a few facts from SEC 10-K filings for e.g. gross revenue, gross profit, gross margin, operating expenses etc. along with the corresponding context.



    For filings like https://www.sec.gov/Archives/edgar/data/1318605/000156459018002956/tsla-20171231.xml , it seems feasible to just use XPath to find out the few required elements and the values.
    But there are filings like (https://www.sec.gov/Archives/edgar/data/19617/000001961718000057/jpm-20171231.xml) where total expense is broken up in different segments with an extension taxonomy.



    My question is




    1. What would be a reliable way to work with files like these? Say, if I just want the total operational expenditure. Is there a reliable way to find what are the elements I'll need to read and then may be sum up?

    2. I've tried using the UBMatrix library for reading xbrl files. It works on some files (non-SEC, can read node values) but for SEC 10-K filings throws NPE. Could there be a particular reason why xbrls instance documents from SEC is failing? (haven't checked library code though)


    In any case, if doing it simply with XPath is possible I'd prefer that.
    Validity of the xbrl document is not important.










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I need to get a few facts from SEC 10-K filings for e.g. gross revenue, gross profit, gross margin, operating expenses etc. along with the corresponding context.



      For filings like https://www.sec.gov/Archives/edgar/data/1318605/000156459018002956/tsla-20171231.xml , it seems feasible to just use XPath to find out the few required elements and the values.
      But there are filings like (https://www.sec.gov/Archives/edgar/data/19617/000001961718000057/jpm-20171231.xml) where total expense is broken up in different segments with an extension taxonomy.



      My question is




      1. What would be a reliable way to work with files like these? Say, if I just want the total operational expenditure. Is there a reliable way to find what are the elements I'll need to read and then may be sum up?

      2. I've tried using the UBMatrix library for reading xbrl files. It works on some files (non-SEC, can read node values) but for SEC 10-K filings throws NPE. Could there be a particular reason why xbrls instance documents from SEC is failing? (haven't checked library code though)


      In any case, if doing it simply with XPath is possible I'd prefer that.
      Validity of the xbrl document is not important.










      share|improve this question













      I need to get a few facts from SEC 10-K filings for e.g. gross revenue, gross profit, gross margin, operating expenses etc. along with the corresponding context.



      For filings like https://www.sec.gov/Archives/edgar/data/1318605/000156459018002956/tsla-20171231.xml , it seems feasible to just use XPath to find out the few required elements and the values.
      But there are filings like (https://www.sec.gov/Archives/edgar/data/19617/000001961718000057/jpm-20171231.xml) where total expense is broken up in different segments with an extension taxonomy.



      My question is




      1. What would be a reliable way to work with files like these? Say, if I just want the total operational expenditure. Is there a reliable way to find what are the elements I'll need to read and then may be sum up?

      2. I've tried using the UBMatrix library for reading xbrl files. It works on some files (non-SEC, can read node values) but for SEC 10-K filings throws NPE. Could there be a particular reason why xbrls instance documents from SEC is failing? (haven't checked library code though)


      In any case, if doing it simply with XPath is possible I'd prefer that.
      Validity of the xbrl document is not important.







      java xml xpath xbrl






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 at 19:06









      DebD

      419




      419
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote













          The most reliable way to work with XBRL files is to use an XBRL processing library. There are a few in Java, some proprietary (with a fee) and some open source.



          There is a maintained list of tools and services on xbrl.org:



          https://www.xbrl.org/the-standard/how/tools-and-services/



          As far as I know, the SEC documents are reliable, widely consumed by a lot of people and tested on many processors. If there is a problem with UBMatrix such as a null pointer exception, I recommend reaching out to them and letting them know so they can address it.



          It is definitely (in theory) possible to use XPath/XQuery/XSLT as well, since XBRL uses XML syntax, but you need to be aware that by resolving the contexts (which is a join in relational terms), you would be in fact re-implementing an incomplete XBRL processor from scratch, with the risks of bugs and sunk costs that go with it. There are a lot of subtleties and an ecosystem of specifications in addition to the core XBRL one (e.g., Dimensions, ...) to take into account in order to not retrieve the wrong values. By using an existing processor, you are building on top of the efforts that other people already invested into doing so, in order to get all the XBRL semantics right: this is a benefit of XBRL being a standard.



          As a final remark: the exact XBRL tags used for gross revenue, gross profit, etc, may vary from company to company, because some use their own tags (extensions) and not the US-GAAP tags. Also, some companies omit some facts that need to be computed by consumers based on other facts. This can be addressed using mappings and formulas on top of the XBRL processor. Charles Hoffman shared reports on the matter with a lot of useful advice, and maintains such mappings online (keywords to search for this are: fundamental accounting concepts, report frames).






          share|improve this answer























          • Can you point to a library (free/fee) that you know to work well / have used? I tried UBmatrix and XBRLAPI which did not work for me.
            – DebD
            Nov 22 at 13:59










          • I can name for example Arelle, ReportingStandard, Fujitsu, the XBRL package in R...
            – Ghislain Fourny
            Nov 28 at 19:01




















          up vote
          1
          down vote













          Depending on what you're looking to do with the data, I would recommend looking at the XBRL US API. This provides API access to all SEC filings, and makes the data available in JSON. You can get a free API key for "private, non-commercial research and development".



          I'd also look at the Arelle open source project, which is an XBRL processor written in Python. In particular, there is a plugin for it which will provide the data in xBRL-JSON format, which you will probably find much easier to work with than the raw XML files, and will take care of the complexity of processing these that Ghislain refers to.






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418970%2freading-xbrl-facts-java%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote













            The most reliable way to work with XBRL files is to use an XBRL processing library. There are a few in Java, some proprietary (with a fee) and some open source.



            There is a maintained list of tools and services on xbrl.org:



            https://www.xbrl.org/the-standard/how/tools-and-services/



            As far as I know, the SEC documents are reliable, widely consumed by a lot of people and tested on many processors. If there is a problem with UBMatrix such as a null pointer exception, I recommend reaching out to them and letting them know so they can address it.



            It is definitely (in theory) possible to use XPath/XQuery/XSLT as well, since XBRL uses XML syntax, but you need to be aware that by resolving the contexts (which is a join in relational terms), you would be in fact re-implementing an incomplete XBRL processor from scratch, with the risks of bugs and sunk costs that go with it. There are a lot of subtleties and an ecosystem of specifications in addition to the core XBRL one (e.g., Dimensions, ...) to take into account in order to not retrieve the wrong values. By using an existing processor, you are building on top of the efforts that other people already invested into doing so, in order to get all the XBRL semantics right: this is a benefit of XBRL being a standard.



            As a final remark: the exact XBRL tags used for gross revenue, gross profit, etc, may vary from company to company, because some use their own tags (extensions) and not the US-GAAP tags. Also, some companies omit some facts that need to be computed by consumers based on other facts. This can be addressed using mappings and formulas on top of the XBRL processor. Charles Hoffman shared reports on the matter with a lot of useful advice, and maintains such mappings online (keywords to search for this are: fundamental accounting concepts, report frames).






            share|improve this answer























            • Can you point to a library (free/fee) that you know to work well / have used? I tried UBmatrix and XBRLAPI which did not work for me.
              – DebD
              Nov 22 at 13:59










            • I can name for example Arelle, ReportingStandard, Fujitsu, the XBRL package in R...
              – Ghislain Fourny
              Nov 28 at 19:01

















            up vote
            2
            down vote













            The most reliable way to work with XBRL files is to use an XBRL processing library. There are a few in Java, some proprietary (with a fee) and some open source.



            There is a maintained list of tools and services on xbrl.org:



            https://www.xbrl.org/the-standard/how/tools-and-services/



            As far as I know, the SEC documents are reliable, widely consumed by a lot of people and tested on many processors. If there is a problem with UBMatrix such as a null pointer exception, I recommend reaching out to them and letting them know so they can address it.



            It is definitely (in theory) possible to use XPath/XQuery/XSLT as well, since XBRL uses XML syntax, but you need to be aware that by resolving the contexts (which is a join in relational terms), you would be in fact re-implementing an incomplete XBRL processor from scratch, with the risks of bugs and sunk costs that go with it. There are a lot of subtleties and an ecosystem of specifications in addition to the core XBRL one (e.g., Dimensions, ...) to take into account in order to not retrieve the wrong values. By using an existing processor, you are building on top of the efforts that other people already invested into doing so, in order to get all the XBRL semantics right: this is a benefit of XBRL being a standard.



            As a final remark: the exact XBRL tags used for gross revenue, gross profit, etc, may vary from company to company, because some use their own tags (extensions) and not the US-GAAP tags. Also, some companies omit some facts that need to be computed by consumers based on other facts. This can be addressed using mappings and formulas on top of the XBRL processor. Charles Hoffman shared reports on the matter with a lot of useful advice, and maintains such mappings online (keywords to search for this are: fundamental accounting concepts, report frames).






            share|improve this answer























            • Can you point to a library (free/fee) that you know to work well / have used? I tried UBmatrix and XBRLAPI which did not work for me.
              – DebD
              Nov 22 at 13:59










            • I can name for example Arelle, ReportingStandard, Fujitsu, the XBRL package in R...
              – Ghislain Fourny
              Nov 28 at 19:01















            up vote
            2
            down vote










            up vote
            2
            down vote









            The most reliable way to work with XBRL files is to use an XBRL processing library. There are a few in Java, some proprietary (with a fee) and some open source.



            There is a maintained list of tools and services on xbrl.org:



            https://www.xbrl.org/the-standard/how/tools-and-services/



            As far as I know, the SEC documents are reliable, widely consumed by a lot of people and tested on many processors. If there is a problem with UBMatrix such as a null pointer exception, I recommend reaching out to them and letting them know so they can address it.



            It is definitely (in theory) possible to use XPath/XQuery/XSLT as well, since XBRL uses XML syntax, but you need to be aware that by resolving the contexts (which is a join in relational terms), you would be in fact re-implementing an incomplete XBRL processor from scratch, with the risks of bugs and sunk costs that go with it. There are a lot of subtleties and an ecosystem of specifications in addition to the core XBRL one (e.g., Dimensions, ...) to take into account in order to not retrieve the wrong values. By using an existing processor, you are building on top of the efforts that other people already invested into doing so, in order to get all the XBRL semantics right: this is a benefit of XBRL being a standard.



            As a final remark: the exact XBRL tags used for gross revenue, gross profit, etc, may vary from company to company, because some use their own tags (extensions) and not the US-GAAP tags. Also, some companies omit some facts that need to be computed by consumers based on other facts. This can be addressed using mappings and formulas on top of the XBRL processor. Charles Hoffman shared reports on the matter with a lot of useful advice, and maintains such mappings online (keywords to search for this are: fundamental accounting concepts, report frames).






            share|improve this answer














            The most reliable way to work with XBRL files is to use an XBRL processing library. There are a few in Java, some proprietary (with a fee) and some open source.



            There is a maintained list of tools and services on xbrl.org:



            https://www.xbrl.org/the-standard/how/tools-and-services/



            As far as I know, the SEC documents are reliable, widely consumed by a lot of people and tested on many processors. If there is a problem with UBMatrix such as a null pointer exception, I recommend reaching out to them and letting them know so they can address it.



            It is definitely (in theory) possible to use XPath/XQuery/XSLT as well, since XBRL uses XML syntax, but you need to be aware that by resolving the contexts (which is a join in relational terms), you would be in fact re-implementing an incomplete XBRL processor from scratch, with the risks of bugs and sunk costs that go with it. There are a lot of subtleties and an ecosystem of specifications in addition to the core XBRL one (e.g., Dimensions, ...) to take into account in order to not retrieve the wrong values. By using an existing processor, you are building on top of the efforts that other people already invested into doing so, in order to get all the XBRL semantics right: this is a benefit of XBRL being a standard.



            As a final remark: the exact XBRL tags used for gross revenue, gross profit, etc, may vary from company to company, because some use their own tags (extensions) and not the US-GAAP tags. Also, some companies omit some facts that need to be computed by consumers based on other facts. This can be addressed using mappings and formulas on top of the XBRL processor. Charles Hoffman shared reports on the matter with a lot of useful advice, and maintains such mappings online (keywords to search for this are: fundamental accounting concepts, report frames).







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 22 at 9:55

























            answered Nov 22 at 9:50









            Ghislain Fourny

            5,03311926




            5,03311926












            • Can you point to a library (free/fee) that you know to work well / have used? I tried UBmatrix and XBRLAPI which did not work for me.
              – DebD
              Nov 22 at 13:59










            • I can name for example Arelle, ReportingStandard, Fujitsu, the XBRL package in R...
              – Ghislain Fourny
              Nov 28 at 19:01




















            • Can you point to a library (free/fee) that you know to work well / have used? I tried UBmatrix and XBRLAPI which did not work for me.
              – DebD
              Nov 22 at 13:59










            • I can name for example Arelle, ReportingStandard, Fujitsu, the XBRL package in R...
              – Ghislain Fourny
              Nov 28 at 19:01


















            Can you point to a library (free/fee) that you know to work well / have used? I tried UBmatrix and XBRLAPI which did not work for me.
            – DebD
            Nov 22 at 13:59




            Can you point to a library (free/fee) that you know to work well / have used? I tried UBmatrix and XBRLAPI which did not work for me.
            – DebD
            Nov 22 at 13:59












            I can name for example Arelle, ReportingStandard, Fujitsu, the XBRL package in R...
            – Ghislain Fourny
            Nov 28 at 19:01






            I can name for example Arelle, ReportingStandard, Fujitsu, the XBRL package in R...
            – Ghislain Fourny
            Nov 28 at 19:01














            up vote
            1
            down vote













            Depending on what you're looking to do with the data, I would recommend looking at the XBRL US API. This provides API access to all SEC filings, and makes the data available in JSON. You can get a free API key for "private, non-commercial research and development".



            I'd also look at the Arelle open source project, which is an XBRL processor written in Python. In particular, there is a plugin for it which will provide the data in xBRL-JSON format, which you will probably find much easier to work with than the raw XML files, and will take care of the complexity of processing these that Ghislain refers to.






            share|improve this answer

























              up vote
              1
              down vote













              Depending on what you're looking to do with the data, I would recommend looking at the XBRL US API. This provides API access to all SEC filings, and makes the data available in JSON. You can get a free API key for "private, non-commercial research and development".



              I'd also look at the Arelle open source project, which is an XBRL processor written in Python. In particular, there is a plugin for it which will provide the data in xBRL-JSON format, which you will probably find much easier to work with than the raw XML files, and will take care of the complexity of processing these that Ghislain refers to.






              share|improve this answer























                up vote
                1
                down vote










                up vote
                1
                down vote









                Depending on what you're looking to do with the data, I would recommend looking at the XBRL US API. This provides API access to all SEC filings, and makes the data available in JSON. You can get a free API key for "private, non-commercial research and development".



                I'd also look at the Arelle open source project, which is an XBRL processor written in Python. In particular, there is a plugin for it which will provide the data in xBRL-JSON format, which you will probably find much easier to work with than the raw XML files, and will take care of the complexity of processing these that Ghislain refers to.






                share|improve this answer












                Depending on what you're looking to do with the data, I would recommend looking at the XBRL US API. This provides API access to all SEC filings, and makes the data available in JSON. You can get a free API key for "private, non-commercial research and development".



                I'd also look at the Arelle open source project, which is an XBRL processor written in Python. In particular, there is a plugin for it which will provide the data in xBRL-JSON format, which you will probably find much easier to work with than the raw XML files, and will take care of the complexity of processing these that Ghislain refers to.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 27 at 20:53









                pdw

                662




                662






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53418970%2freading-xbrl-facts-java%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                    Calculate evaluation metrics using cross_val_predict sklearn

                    Insert data from modal to MySQL (multiple modal on website)