Retrieving all Excel file links from a webpage with Excel VBA












2














I'm trying to get all the downloadable Excel file links from the website, but having difficulty. Please help to guide me. Thanks.



Sub TYEX()

Dim internet As Object
Dim internetdata As Object
Dim div_result As Object
Dim header_links As Object
Dim link As Object
Dim URL As String

Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True

URL = "https://www.jpx.co.jp/markets/public/short-selling/index.html"
internet.Navigate URL

Do Until internet.ReadyState >= 4
DoEvents
Loop

Application.Wait Now + TimeSerial(0, 0, 5)

Set internetdata = internet.Document
Set div_result = internetdata.getElementById("readArea")

Set header_links = div_result.getElementsByTagName("td")

For Each h In header_links
Set link = h.ChildNodes.item(0)
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = link.href
Next

MsgBox "done"
End Sub









share|improve this question
























  • Maybe you can include more info on where is the difficulty and what have you tried.
    – iamanigeeit
    Nov 23 at 9:22
















2














I'm trying to get all the downloadable Excel file links from the website, but having difficulty. Please help to guide me. Thanks.



Sub TYEX()

Dim internet As Object
Dim internetdata As Object
Dim div_result As Object
Dim header_links As Object
Dim link As Object
Dim URL As String

Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True

URL = "https://www.jpx.co.jp/markets/public/short-selling/index.html"
internet.Navigate URL

Do Until internet.ReadyState >= 4
DoEvents
Loop

Application.Wait Now + TimeSerial(0, 0, 5)

Set internetdata = internet.Document
Set div_result = internetdata.getElementById("readArea")

Set header_links = div_result.getElementsByTagName("td")

For Each h In header_links
Set link = h.ChildNodes.item(0)
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = link.href
Next

MsgBox "done"
End Sub









share|improve this question
























  • Maybe you can include more info on where is the difficulty and what have you tried.
    – iamanigeeit
    Nov 23 at 9:22














2












2








2







I'm trying to get all the downloadable Excel file links from the website, but having difficulty. Please help to guide me. Thanks.



Sub TYEX()

Dim internet As Object
Dim internetdata As Object
Dim div_result As Object
Dim header_links As Object
Dim link As Object
Dim URL As String

Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True

URL = "https://www.jpx.co.jp/markets/public/short-selling/index.html"
internet.Navigate URL

Do Until internet.ReadyState >= 4
DoEvents
Loop

Application.Wait Now + TimeSerial(0, 0, 5)

Set internetdata = internet.Document
Set div_result = internetdata.getElementById("readArea")

Set header_links = div_result.getElementsByTagName("td")

For Each h In header_links
Set link = h.ChildNodes.item(0)
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = link.href
Next

MsgBox "done"
End Sub









share|improve this question















I'm trying to get all the downloadable Excel file links from the website, but having difficulty. Please help to guide me. Thanks.



Sub TYEX()

Dim internet As Object
Dim internetdata As Object
Dim div_result As Object
Dim header_links As Object
Dim link As Object
Dim URL As String

Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True

URL = "https://www.jpx.co.jp/markets/public/short-selling/index.html"
internet.Navigate URL

Do Until internet.ReadyState >= 4
DoEvents
Loop

Application.Wait Now + TimeSerial(0, 0, 5)

Set internetdata = internet.Document
Set div_result = internetdata.getElementById("readArea")

Set header_links = div_result.getElementsByTagName("td")

For Each h In header_links
Set link = h.ChildNodes.item(0)
Cells(Range("A" & Rows.Count).End(xlUp).Row + 1, 1) = link.href
Next

MsgBox "done"
End Sub






excel vba excel-vba






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 at 8:58









K.Dᴀᴠɪs

6,965112139




6,965112139










asked Nov 23 at 8:45









GenZ

133




133












  • Maybe you can include more info on where is the difficulty and what have you tried.
    – iamanigeeit
    Nov 23 at 9:22


















  • Maybe you can include more info on where is the difficulty and what have you tried.
    – iamanigeeit
    Nov 23 at 9:22
















Maybe you can include more info on where is the difficulty and what have you tried.
– iamanigeeit
Nov 23 at 9:22




Maybe you can include more info on where is the difficulty and what have you tried.
– iamanigeeit
Nov 23 at 9:22












2 Answers
2






active

oldest

votes


















3














You can use an attribute = value CSS selector with $ operator to say the href value must end with .xls. Then use querySelectorAll to retrieve all matched results. Using CSS selectors is a very fast and generally robust method.



Dim list As Object
Set list = ie.document.querySelectorAll("[href$='.xls']")




It is much faster to use XMLHTTP as well, rather than opening IE. Note that you can then pass these links to a function to perform a binary download or to URLMon for downloading.



Option Explicit   
Public Sub Links()
Dim sResponse As String, html As HTMLDocument, list As Object, i As Long

With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.jpx.co.jp/markets/public/short-selling/index.html", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With

Set html = New HTMLDocument
With html
.body.innerHTML = sResponse
Set list = html.querySelectorAll("[href$='.xls']")
End With
For i = 0 To list.Length - 1
Debug.Print Replace$(list.item(i), "about:", "https://www.jpx.co.jp")
Next
End Sub




Example download function (though you could re-use your existing XMLHTTP object - this is just to illustrate):



Public Function DownloadFile(ByVal downloadFolder As String, ByVal downloadURL As String) As String
Dim http As Object , tempArr As Variant
Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
http.Open "GET", downloadURL, False
http.send
On Error GoTo errhand
With CreateObject("ADODB.Stream")
.Open
.Type = 1
.write http.responseBody
tempArr = Split(downloadURL, "/")
tempArr = tempArr(UBound(tempArr))
.SaveToFile downloadFolder & tempArr, 2 '< "/" on enter of downloadFolder. 2 for overwrite which is Ok if no file modifications.
.Close
End With
DownloadFile = downloadFolder & tempArr
Exit Function
errhand:
If Err.Number <> 0 Then
Debug.Print Err.Number, Err.Description
MsgBox "Download failed"
End If
DownloadFile = vbNullString
End Function




References (VBE > Tools > References):




  1. Microsoft HTML Object Library






share|improve this answer























  • I was hoping you would post here. I still try to learn from your technique using querySelectorAll() - it seems pretty powerful. OP, you should probably go with this method.
    – K.Dᴀᴠɪs
    Nov 23 at 9:36








  • 1




    @K.Dᴀᴠɪs Pop over to the dawghaus if you ever what to discuss or ask for any useful references, or drop off any of your own. We have a library we add to as well.
    – QHarr
    Nov 23 at 9:39






  • 1




    Sure will - thanks!
    – K.Dᴀᴠɪs
    Nov 23 at 9:40










  • Thanks QHarr! Will try your method.
    – GenZ
    Nov 23 at 11:06










  • Cool. Any problems let me know and will explain.
    – QHarr
    Nov 24 at 5:55



















4














You had the idea down correctly, but here's a different approach:



Sub TYEX()

Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")
With ie
.navigate "https://www.jpx.co.jp/markets/public/short-selling/index.html"
.Visible = True

Do While .Busy Or .readyState < 4
DoEvents
Loop

Dim doc As Object, tbl As Object
Set doc = .document
Set tbl = doc.getElementsByClassName("component-normal-table")(0).Children(0)

Dim r As Long, xlsArr(), a As Object

With tbl.Rows
ReDim xlsArr(1 To .Length - 1)
For r = 1 To .Length - 1 ' 0 is the table header
xlsArr(r) = .Item(r).Children(1).innerHTML
Next r
End With

With CreateObject("VBScript.RegExp")
.Pattern = "<a href=""(/markets.*?.xls)"
For r = 1 To UBound(xlsArr)
xlsArr(r) = "https://www.jpx.co.jp" & .Execute(xlsArr(r))(0).SubMatches(0)
Debug.Print xlsArr(r)
Next
End With
End With

'Add to sheet
Dim ws As Worksheet, rng As Range
Set ws = ThisWorkbook.Worksheets(1)
With ws
Set rng = .Range(.Cells(NextRow(ws), 1), .Cells(NextRow(ws) + UBound( _
xlsArr) - 1, 1))
rng.Value = Application.Transpose(xlsArr)
End With

End Sub

Public Function NextRow(ByVal ws As Worksheet, Optional ByVal col As Variant = 1) As Long
With ws
NextRow = .Cells(.Rows.Count, col).End(xlUp).Row + 1
End With
End Function




Breaking Down the Code



This will loop your html table rows. We start at 1, because 0 is actually just the table header.



With tbl.Rows
ReDim xlsArr(1 To .Length - 1)
For r = 1 To .Length - 1 ' 0 is the table header
xlsArr(r) = .Item(r).Children(1).innerHTML
Next r
End With


This uses regular expressions to extract the url from the innerHTML property. You can see how this particular regex works here: Regex101



With CreateObject("VBScript.RegExp")
.Pattern = "<a href=""(/markets.*?.xls)"
For r = 1 To UBound(xlsArr)
xlsArr(r) = "https://www.jpx.co.jp" & .Execute(xlsArr(r))(0).SubMatches(0)
Debug.Print xlsArr(r)
Next
End With


You will size your range to be the same size as your array that contains the links, then write the array to the worksheet. This is usually much faster than writing cells one-by-one.



'Add to sheet
Dim ws As Worksheet, rng As Range
Set ws = ThisWorkbook.Worksheets(1)
With ws
Set rng = .Range(.Cells(NextRow(ws), 1), .Cells(NextRow(ws) + UBound( _
xlsArr) - 1, 1))
rng.Value = Application.Transpose(xlsArr)
End With





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53443254%2fretrieving-all-excel-file-links-from-a-webpage-with-excel-vba%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    You can use an attribute = value CSS selector with $ operator to say the href value must end with .xls. Then use querySelectorAll to retrieve all matched results. Using CSS selectors is a very fast and generally robust method.



    Dim list As Object
    Set list = ie.document.querySelectorAll("[href$='.xls']")




    It is much faster to use XMLHTTP as well, rather than opening IE. Note that you can then pass these links to a function to perform a binary download or to URLMon for downloading.



    Option Explicit   
    Public Sub Links()
    Dim sResponse As String, html As HTMLDocument, list As Object, i As Long

    With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://www.jpx.co.jp/markets/public/short-selling/index.html", False
    .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
    .send
    sResponse = StrConv(.responseBody, vbUnicode)
    End With

    Set html = New HTMLDocument
    With html
    .body.innerHTML = sResponse
    Set list = html.querySelectorAll("[href$='.xls']")
    End With
    For i = 0 To list.Length - 1
    Debug.Print Replace$(list.item(i), "about:", "https://www.jpx.co.jp")
    Next
    End Sub




    Example download function (though you could re-use your existing XMLHTTP object - this is just to illustrate):



    Public Function DownloadFile(ByVal downloadFolder As String, ByVal downloadURL As String) As String
    Dim http As Object , tempArr As Variant
    Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
    http.Open "GET", downloadURL, False
    http.send
    On Error GoTo errhand
    With CreateObject("ADODB.Stream")
    .Open
    .Type = 1
    .write http.responseBody
    tempArr = Split(downloadURL, "/")
    tempArr = tempArr(UBound(tempArr))
    .SaveToFile downloadFolder & tempArr, 2 '< "/" on enter of downloadFolder. 2 for overwrite which is Ok if no file modifications.
    .Close
    End With
    DownloadFile = downloadFolder & tempArr
    Exit Function
    errhand:
    If Err.Number <> 0 Then
    Debug.Print Err.Number, Err.Description
    MsgBox "Download failed"
    End If
    DownloadFile = vbNullString
    End Function




    References (VBE > Tools > References):




    1. Microsoft HTML Object Library






    share|improve this answer























    • I was hoping you would post here. I still try to learn from your technique using querySelectorAll() - it seems pretty powerful. OP, you should probably go with this method.
      – K.Dᴀᴠɪs
      Nov 23 at 9:36








    • 1




      @K.Dᴀᴠɪs Pop over to the dawghaus if you ever what to discuss or ask for any useful references, or drop off any of your own. We have a library we add to as well.
      – QHarr
      Nov 23 at 9:39






    • 1




      Sure will - thanks!
      – K.Dᴀᴠɪs
      Nov 23 at 9:40










    • Thanks QHarr! Will try your method.
      – GenZ
      Nov 23 at 11:06










    • Cool. Any problems let me know and will explain.
      – QHarr
      Nov 24 at 5:55
















    3














    You can use an attribute = value CSS selector with $ operator to say the href value must end with .xls. Then use querySelectorAll to retrieve all matched results. Using CSS selectors is a very fast and generally robust method.



    Dim list As Object
    Set list = ie.document.querySelectorAll("[href$='.xls']")




    It is much faster to use XMLHTTP as well, rather than opening IE. Note that you can then pass these links to a function to perform a binary download or to URLMon for downloading.



    Option Explicit   
    Public Sub Links()
    Dim sResponse As String, html As HTMLDocument, list As Object, i As Long

    With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://www.jpx.co.jp/markets/public/short-selling/index.html", False
    .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
    .send
    sResponse = StrConv(.responseBody, vbUnicode)
    End With

    Set html = New HTMLDocument
    With html
    .body.innerHTML = sResponse
    Set list = html.querySelectorAll("[href$='.xls']")
    End With
    For i = 0 To list.Length - 1
    Debug.Print Replace$(list.item(i), "about:", "https://www.jpx.co.jp")
    Next
    End Sub




    Example download function (though you could re-use your existing XMLHTTP object - this is just to illustrate):



    Public Function DownloadFile(ByVal downloadFolder As String, ByVal downloadURL As String) As String
    Dim http As Object , tempArr As Variant
    Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
    http.Open "GET", downloadURL, False
    http.send
    On Error GoTo errhand
    With CreateObject("ADODB.Stream")
    .Open
    .Type = 1
    .write http.responseBody
    tempArr = Split(downloadURL, "/")
    tempArr = tempArr(UBound(tempArr))
    .SaveToFile downloadFolder & tempArr, 2 '< "/" on enter of downloadFolder. 2 for overwrite which is Ok if no file modifications.
    .Close
    End With
    DownloadFile = downloadFolder & tempArr
    Exit Function
    errhand:
    If Err.Number <> 0 Then
    Debug.Print Err.Number, Err.Description
    MsgBox "Download failed"
    End If
    DownloadFile = vbNullString
    End Function




    References (VBE > Tools > References):




    1. Microsoft HTML Object Library






    share|improve this answer























    • I was hoping you would post here. I still try to learn from your technique using querySelectorAll() - it seems pretty powerful. OP, you should probably go with this method.
      – K.Dᴀᴠɪs
      Nov 23 at 9:36








    • 1




      @K.Dᴀᴠɪs Pop over to the dawghaus if you ever what to discuss or ask for any useful references, or drop off any of your own. We have a library we add to as well.
      – QHarr
      Nov 23 at 9:39






    • 1




      Sure will - thanks!
      – K.Dᴀᴠɪs
      Nov 23 at 9:40










    • Thanks QHarr! Will try your method.
      – GenZ
      Nov 23 at 11:06










    • Cool. Any problems let me know and will explain.
      – QHarr
      Nov 24 at 5:55














    3












    3








    3






    You can use an attribute = value CSS selector with $ operator to say the href value must end with .xls. Then use querySelectorAll to retrieve all matched results. Using CSS selectors is a very fast and generally robust method.



    Dim list As Object
    Set list = ie.document.querySelectorAll("[href$='.xls']")




    It is much faster to use XMLHTTP as well, rather than opening IE. Note that you can then pass these links to a function to perform a binary download or to URLMon for downloading.



    Option Explicit   
    Public Sub Links()
    Dim sResponse As String, html As HTMLDocument, list As Object, i As Long

    With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://www.jpx.co.jp/markets/public/short-selling/index.html", False
    .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
    .send
    sResponse = StrConv(.responseBody, vbUnicode)
    End With

    Set html = New HTMLDocument
    With html
    .body.innerHTML = sResponse
    Set list = html.querySelectorAll("[href$='.xls']")
    End With
    For i = 0 To list.Length - 1
    Debug.Print Replace$(list.item(i), "about:", "https://www.jpx.co.jp")
    Next
    End Sub




    Example download function (though you could re-use your existing XMLHTTP object - this is just to illustrate):



    Public Function DownloadFile(ByVal downloadFolder As String, ByVal downloadURL As String) As String
    Dim http As Object , tempArr As Variant
    Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
    http.Open "GET", downloadURL, False
    http.send
    On Error GoTo errhand
    With CreateObject("ADODB.Stream")
    .Open
    .Type = 1
    .write http.responseBody
    tempArr = Split(downloadURL, "/")
    tempArr = tempArr(UBound(tempArr))
    .SaveToFile downloadFolder & tempArr, 2 '< "/" on enter of downloadFolder. 2 for overwrite which is Ok if no file modifications.
    .Close
    End With
    DownloadFile = downloadFolder & tempArr
    Exit Function
    errhand:
    If Err.Number <> 0 Then
    Debug.Print Err.Number, Err.Description
    MsgBox "Download failed"
    End If
    DownloadFile = vbNullString
    End Function




    References (VBE > Tools > References):




    1. Microsoft HTML Object Library






    share|improve this answer














    You can use an attribute = value CSS selector with $ operator to say the href value must end with .xls. Then use querySelectorAll to retrieve all matched results. Using CSS selectors is a very fast and generally robust method.



    Dim list As Object
    Set list = ie.document.querySelectorAll("[href$='.xls']")




    It is much faster to use XMLHTTP as well, rather than opening IE. Note that you can then pass these links to a function to perform a binary download or to URLMon for downloading.



    Option Explicit   
    Public Sub Links()
    Dim sResponse As String, html As HTMLDocument, list As Object, i As Long

    With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://www.jpx.co.jp/markets/public/short-selling/index.html", False
    .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
    .send
    sResponse = StrConv(.responseBody, vbUnicode)
    End With

    Set html = New HTMLDocument
    With html
    .body.innerHTML = sResponse
    Set list = html.querySelectorAll("[href$='.xls']")
    End With
    For i = 0 To list.Length - 1
    Debug.Print Replace$(list.item(i), "about:", "https://www.jpx.co.jp")
    Next
    End Sub




    Example download function (though you could re-use your existing XMLHTTP object - this is just to illustrate):



    Public Function DownloadFile(ByVal downloadFolder As String, ByVal downloadURL As String) As String
    Dim http As Object , tempArr As Variant
    Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
    http.Open "GET", downloadURL, False
    http.send
    On Error GoTo errhand
    With CreateObject("ADODB.Stream")
    .Open
    .Type = 1
    .write http.responseBody
    tempArr = Split(downloadURL, "/")
    tempArr = tempArr(UBound(tempArr))
    .SaveToFile downloadFolder & tempArr, 2 '< "/" on enter of downloadFolder. 2 for overwrite which is Ok if no file modifications.
    .Close
    End With
    DownloadFile = downloadFolder & tempArr
    Exit Function
    errhand:
    If Err.Number <> 0 Then
    Debug.Print Err.Number, Err.Description
    MsgBox "Download failed"
    End If
    DownloadFile = vbNullString
    End Function




    References (VBE > Tools > References):




    1. Microsoft HTML Object Library







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 24 at 13:13

























    answered Nov 23 at 9:28









    QHarr

    29.9k81841




    29.9k81841












    • I was hoping you would post here. I still try to learn from your technique using querySelectorAll() - it seems pretty powerful. OP, you should probably go with this method.
      – K.Dᴀᴠɪs
      Nov 23 at 9:36








    • 1




      @K.Dᴀᴠɪs Pop over to the dawghaus if you ever what to discuss or ask for any useful references, or drop off any of your own. We have a library we add to as well.
      – QHarr
      Nov 23 at 9:39






    • 1




      Sure will - thanks!
      – K.Dᴀᴠɪs
      Nov 23 at 9:40










    • Thanks QHarr! Will try your method.
      – GenZ
      Nov 23 at 11:06










    • Cool. Any problems let me know and will explain.
      – QHarr
      Nov 24 at 5:55


















    • I was hoping you would post here. I still try to learn from your technique using querySelectorAll() - it seems pretty powerful. OP, you should probably go with this method.
      – K.Dᴀᴠɪs
      Nov 23 at 9:36








    • 1




      @K.Dᴀᴠɪs Pop over to the dawghaus if you ever what to discuss or ask for any useful references, or drop off any of your own. We have a library we add to as well.
      – QHarr
      Nov 23 at 9:39






    • 1




      Sure will - thanks!
      – K.Dᴀᴠɪs
      Nov 23 at 9:40










    • Thanks QHarr! Will try your method.
      – GenZ
      Nov 23 at 11:06










    • Cool. Any problems let me know and will explain.
      – QHarr
      Nov 24 at 5:55
















    I was hoping you would post here. I still try to learn from your technique using querySelectorAll() - it seems pretty powerful. OP, you should probably go with this method.
    – K.Dᴀᴠɪs
    Nov 23 at 9:36






    I was hoping you would post here. I still try to learn from your technique using querySelectorAll() - it seems pretty powerful. OP, you should probably go with this method.
    – K.Dᴀᴠɪs
    Nov 23 at 9:36






    1




    1




    @K.Dᴀᴠɪs Pop over to the dawghaus if you ever what to discuss or ask for any useful references, or drop off any of your own. We have a library we add to as well.
    – QHarr
    Nov 23 at 9:39




    @K.Dᴀᴠɪs Pop over to the dawghaus if you ever what to discuss or ask for any useful references, or drop off any of your own. We have a library we add to as well.
    – QHarr
    Nov 23 at 9:39




    1




    1




    Sure will - thanks!
    – K.Dᴀᴠɪs
    Nov 23 at 9:40




    Sure will - thanks!
    – K.Dᴀᴠɪs
    Nov 23 at 9:40












    Thanks QHarr! Will try your method.
    – GenZ
    Nov 23 at 11:06




    Thanks QHarr! Will try your method.
    – GenZ
    Nov 23 at 11:06












    Cool. Any problems let me know and will explain.
    – QHarr
    Nov 24 at 5:55




    Cool. Any problems let me know and will explain.
    – QHarr
    Nov 24 at 5:55













    4














    You had the idea down correctly, but here's a different approach:



    Sub TYEX()

    Dim ie As Object
    Set ie = CreateObject("InternetExplorer.Application")
    With ie
    .navigate "https://www.jpx.co.jp/markets/public/short-selling/index.html"
    .Visible = True

    Do While .Busy Or .readyState < 4
    DoEvents
    Loop

    Dim doc As Object, tbl As Object
    Set doc = .document
    Set tbl = doc.getElementsByClassName("component-normal-table")(0).Children(0)

    Dim r As Long, xlsArr(), a As Object

    With tbl.Rows
    ReDim xlsArr(1 To .Length - 1)
    For r = 1 To .Length - 1 ' 0 is the table header
    xlsArr(r) = .Item(r).Children(1).innerHTML
    Next r
    End With

    With CreateObject("VBScript.RegExp")
    .Pattern = "<a href=""(/markets.*?.xls)"
    For r = 1 To UBound(xlsArr)
    xlsArr(r) = "https://www.jpx.co.jp" & .Execute(xlsArr(r))(0).SubMatches(0)
    Debug.Print xlsArr(r)
    Next
    End With
    End With

    'Add to sheet
    Dim ws As Worksheet, rng As Range
    Set ws = ThisWorkbook.Worksheets(1)
    With ws
    Set rng = .Range(.Cells(NextRow(ws), 1), .Cells(NextRow(ws) + UBound( _
    xlsArr) - 1, 1))
    rng.Value = Application.Transpose(xlsArr)
    End With

    End Sub

    Public Function NextRow(ByVal ws As Worksheet, Optional ByVal col As Variant = 1) As Long
    With ws
    NextRow = .Cells(.Rows.Count, col).End(xlUp).Row + 1
    End With
    End Function




    Breaking Down the Code



    This will loop your html table rows. We start at 1, because 0 is actually just the table header.



    With tbl.Rows
    ReDim xlsArr(1 To .Length - 1)
    For r = 1 To .Length - 1 ' 0 is the table header
    xlsArr(r) = .Item(r).Children(1).innerHTML
    Next r
    End With


    This uses regular expressions to extract the url from the innerHTML property. You can see how this particular regex works here: Regex101



    With CreateObject("VBScript.RegExp")
    .Pattern = "<a href=""(/markets.*?.xls)"
    For r = 1 To UBound(xlsArr)
    xlsArr(r) = "https://www.jpx.co.jp" & .Execute(xlsArr(r))(0).SubMatches(0)
    Debug.Print xlsArr(r)
    Next
    End With


    You will size your range to be the same size as your array that contains the links, then write the array to the worksheet. This is usually much faster than writing cells one-by-one.



    'Add to sheet
    Dim ws As Worksheet, rng As Range
    Set ws = ThisWorkbook.Worksheets(1)
    With ws
    Set rng = .Range(.Cells(NextRow(ws), 1), .Cells(NextRow(ws) + UBound( _
    xlsArr) - 1, 1))
    rng.Value = Application.Transpose(xlsArr)
    End With





    share|improve this answer




























      4














      You had the idea down correctly, but here's a different approach:



      Sub TYEX()

      Dim ie As Object
      Set ie = CreateObject("InternetExplorer.Application")
      With ie
      .navigate "https://www.jpx.co.jp/markets/public/short-selling/index.html"
      .Visible = True

      Do While .Busy Or .readyState < 4
      DoEvents
      Loop

      Dim doc As Object, tbl As Object
      Set doc = .document
      Set tbl = doc.getElementsByClassName("component-normal-table")(0).Children(0)

      Dim r As Long, xlsArr(), a As Object

      With tbl.Rows
      ReDim xlsArr(1 To .Length - 1)
      For r = 1 To .Length - 1 ' 0 is the table header
      xlsArr(r) = .Item(r).Children(1).innerHTML
      Next r
      End With

      With CreateObject("VBScript.RegExp")
      .Pattern = "<a href=""(/markets.*?.xls)"
      For r = 1 To UBound(xlsArr)
      xlsArr(r) = "https://www.jpx.co.jp" & .Execute(xlsArr(r))(0).SubMatches(0)
      Debug.Print xlsArr(r)
      Next
      End With
      End With

      'Add to sheet
      Dim ws As Worksheet, rng As Range
      Set ws = ThisWorkbook.Worksheets(1)
      With ws
      Set rng = .Range(.Cells(NextRow(ws), 1), .Cells(NextRow(ws) + UBound( _
      xlsArr) - 1, 1))
      rng.Value = Application.Transpose(xlsArr)
      End With

      End Sub

      Public Function NextRow(ByVal ws As Worksheet, Optional ByVal col As Variant = 1) As Long
      With ws
      NextRow = .Cells(.Rows.Count, col).End(xlUp).Row + 1
      End With
      End Function




      Breaking Down the Code



      This will loop your html table rows. We start at 1, because 0 is actually just the table header.



      With tbl.Rows
      ReDim xlsArr(1 To .Length - 1)
      For r = 1 To .Length - 1 ' 0 is the table header
      xlsArr(r) = .Item(r).Children(1).innerHTML
      Next r
      End With


      This uses regular expressions to extract the url from the innerHTML property. You can see how this particular regex works here: Regex101



      With CreateObject("VBScript.RegExp")
      .Pattern = "<a href=""(/markets.*?.xls)"
      For r = 1 To UBound(xlsArr)
      xlsArr(r) = "https://www.jpx.co.jp" & .Execute(xlsArr(r))(0).SubMatches(0)
      Debug.Print xlsArr(r)
      Next
      End With


      You will size your range to be the same size as your array that contains the links, then write the array to the worksheet. This is usually much faster than writing cells one-by-one.



      'Add to sheet
      Dim ws As Worksheet, rng As Range
      Set ws = ThisWorkbook.Worksheets(1)
      With ws
      Set rng = .Range(.Cells(NextRow(ws), 1), .Cells(NextRow(ws) + UBound( _
      xlsArr) - 1, 1))
      rng.Value = Application.Transpose(xlsArr)
      End With





      share|improve this answer


























        4












        4








        4






        You had the idea down correctly, but here's a different approach:



        Sub TYEX()

        Dim ie As Object
        Set ie = CreateObject("InternetExplorer.Application")
        With ie
        .navigate "https://www.jpx.co.jp/markets/public/short-selling/index.html"
        .Visible = True

        Do While .Busy Or .readyState < 4
        DoEvents
        Loop

        Dim doc As Object, tbl As Object
        Set doc = .document
        Set tbl = doc.getElementsByClassName("component-normal-table")(0).Children(0)

        Dim r As Long, xlsArr(), a As Object

        With tbl.Rows
        ReDim xlsArr(1 To .Length - 1)
        For r = 1 To .Length - 1 ' 0 is the table header
        xlsArr(r) = .Item(r).Children(1).innerHTML
        Next r
        End With

        With CreateObject("VBScript.RegExp")
        .Pattern = "<a href=""(/markets.*?.xls)"
        For r = 1 To UBound(xlsArr)
        xlsArr(r) = "https://www.jpx.co.jp" & .Execute(xlsArr(r))(0).SubMatches(0)
        Debug.Print xlsArr(r)
        Next
        End With
        End With

        'Add to sheet
        Dim ws As Worksheet, rng As Range
        Set ws = ThisWorkbook.Worksheets(1)
        With ws
        Set rng = .Range(.Cells(NextRow(ws), 1), .Cells(NextRow(ws) + UBound( _
        xlsArr) - 1, 1))
        rng.Value = Application.Transpose(xlsArr)
        End With

        End Sub

        Public Function NextRow(ByVal ws As Worksheet, Optional ByVal col As Variant = 1) As Long
        With ws
        NextRow = .Cells(.Rows.Count, col).End(xlUp).Row + 1
        End With
        End Function




        Breaking Down the Code



        This will loop your html table rows. We start at 1, because 0 is actually just the table header.



        With tbl.Rows
        ReDim xlsArr(1 To .Length - 1)
        For r = 1 To .Length - 1 ' 0 is the table header
        xlsArr(r) = .Item(r).Children(1).innerHTML
        Next r
        End With


        This uses regular expressions to extract the url from the innerHTML property. You can see how this particular regex works here: Regex101



        With CreateObject("VBScript.RegExp")
        .Pattern = "<a href=""(/markets.*?.xls)"
        For r = 1 To UBound(xlsArr)
        xlsArr(r) = "https://www.jpx.co.jp" & .Execute(xlsArr(r))(0).SubMatches(0)
        Debug.Print xlsArr(r)
        Next
        End With


        You will size your range to be the same size as your array that contains the links, then write the array to the worksheet. This is usually much faster than writing cells one-by-one.



        'Add to sheet
        Dim ws As Worksheet, rng As Range
        Set ws = ThisWorkbook.Worksheets(1)
        With ws
        Set rng = .Range(.Cells(NextRow(ws), 1), .Cells(NextRow(ws) + UBound( _
        xlsArr) - 1, 1))
        rng.Value = Application.Transpose(xlsArr)
        End With





        share|improve this answer














        You had the idea down correctly, but here's a different approach:



        Sub TYEX()

        Dim ie As Object
        Set ie = CreateObject("InternetExplorer.Application")
        With ie
        .navigate "https://www.jpx.co.jp/markets/public/short-selling/index.html"
        .Visible = True

        Do While .Busy Or .readyState < 4
        DoEvents
        Loop

        Dim doc As Object, tbl As Object
        Set doc = .document
        Set tbl = doc.getElementsByClassName("component-normal-table")(0).Children(0)

        Dim r As Long, xlsArr(), a As Object

        With tbl.Rows
        ReDim xlsArr(1 To .Length - 1)
        For r = 1 To .Length - 1 ' 0 is the table header
        xlsArr(r) = .Item(r).Children(1).innerHTML
        Next r
        End With

        With CreateObject("VBScript.RegExp")
        .Pattern = "<a href=""(/markets.*?.xls)"
        For r = 1 To UBound(xlsArr)
        xlsArr(r) = "https://www.jpx.co.jp" & .Execute(xlsArr(r))(0).SubMatches(0)
        Debug.Print xlsArr(r)
        Next
        End With
        End With

        'Add to sheet
        Dim ws As Worksheet, rng As Range
        Set ws = ThisWorkbook.Worksheets(1)
        With ws
        Set rng = .Range(.Cells(NextRow(ws), 1), .Cells(NextRow(ws) + UBound( _
        xlsArr) - 1, 1))
        rng.Value = Application.Transpose(xlsArr)
        End With

        End Sub

        Public Function NextRow(ByVal ws As Worksheet, Optional ByVal col As Variant = 1) As Long
        With ws
        NextRow = .Cells(.Rows.Count, col).End(xlUp).Row + 1
        End With
        End Function




        Breaking Down the Code



        This will loop your html table rows. We start at 1, because 0 is actually just the table header.



        With tbl.Rows
        ReDim xlsArr(1 To .Length - 1)
        For r = 1 To .Length - 1 ' 0 is the table header
        xlsArr(r) = .Item(r).Children(1).innerHTML
        Next r
        End With


        This uses regular expressions to extract the url from the innerHTML property. You can see how this particular regex works here: Regex101



        With CreateObject("VBScript.RegExp")
        .Pattern = "<a href=""(/markets.*?.xls)"
        For r = 1 To UBound(xlsArr)
        xlsArr(r) = "https://www.jpx.co.jp" & .Execute(xlsArr(r))(0).SubMatches(0)
        Debug.Print xlsArr(r)
        Next
        End With


        You will size your range to be the same size as your array that contains the links, then write the array to the worksheet. This is usually much faster than writing cells one-by-one.



        'Add to sheet
        Dim ws As Worksheet, rng As Range
        Set ws = ThisWorkbook.Worksheets(1)
        With ws
        Set rng = .Range(.Cells(NextRow(ws), 1), .Cells(NextRow(ws) + UBound( _
        xlsArr) - 1, 1))
        rng.Value = Application.Transpose(xlsArr)
        End With






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 23 at 9:32

























        answered Nov 23 at 9:26









        K.Dᴀᴠɪs

        6,965112139




        6,965112139






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53443254%2fretrieving-all-excel-file-links-from-a-webpage-with-excel-vba%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Futebolista

            Lallio

            Jornalista