Extracting links from sitemap.xml into python array (single dimensional)

Google Colab indexing practice:

For previous article where I have shared google colab ipynb that takes urls in bulk, if you do not have urls in bulk for your site which is uing wordpress as cms then you can simply type url such as "howtocfdtrade.com/sitemap.xml' notice here addition of sitemap.xml word to your domain, now click on posts.xml (or the one contaning the word post) from the list.

You will land on table like this in html
Now you just have to extract the urls you want to index, usually latest post come in the last.

Open up your firefox dev tools (Right Click ->Inspect) Open Console tab ( second one) instead of current inspect tab.

Once opened type the following code below:

Javascript Code:

// JavaScript code to be executed in the browser console

// Get the table element by its ID
var table = document.getElementById('sitemap__table');

// Initialize an array to hold the URLs
var urls = [];

// Iterate over each row in the table body
for (var i = 1, row; row = table.rows[i]; i++) {
// Get the URL from the first column of the row
var url = row.cells[0].getElementsByTagName('a')[0].href;
// Add the URL to the array
urls.push(url);
}

// Format the URLs in Python array format
var pythonArray = "urls = [" + urls.map(url => `"${url}"`).join(", ") + "]";

// Print the Python array format to the console
console.log(pythonArray);

 

Result:

Just remove the "urls = " from the output text and you will have text like this, copy it and paste it in google colab file .ipynb file.

["https://howtocfdtrade.com/2024/06/05/the-stalled-or-deliberation-candlestick-pattern/", "https://howtocfdtrade.com/2024/06/05/the-advance-block-pattern/", "https://howtocfdtrade.com/2024/06/05/bullish-downside-gap-two-rabbits/", "https://howtocfdtrade.com/2024/06/05/bullish-concealing-baby-swallow/", "https://howtocfdtrade.com/2024/06/05/bullish-ladder-bottom/", "https://howtocfdtrade.com/2024/06/05/the-kicker-pattern/", "https://howtocfdtrade.com/2024/06/05/bullish-unique-three-river-bottom/", "https://howtocfdtrade.com/2024/06/05/three-stars-in-the-south/", "https://howtocfdtrade.com/2024/06/05/bullish-decent-block/", "https://howtocfdtrade.com/2024/06/05/bullish-after-bottom-gap-up/", "https://howtocfdtrade.com/2024/06/05/bullish-deliberation-block/", "https://howtocfdtrade.com/2024/06/05/on-neck-line/", "https://howtocfdtrade.com/2024/06/05/bullish-after-bottom-gap-up-2/", "https://howtocfdtrade.com/2024/06/05/in-neck-line/", "https://howtocfdtrade.com/2024/06/05/thrusting-candlestick-pattern/", "https://howtocfdtrade.com/2024/06/05/side-by-side-white-lines-pattern/"]

If you don't have google colab file running yet you can simply follow this tutorial .

Note:

If all be good you should be seeing something like this,

https://howtocfdtrade.com/2024/06/05/the-stalled-or-deliberation-candlestick-pattern/
https://howtocfdtrade.com/2024/06/05/the-advance-block-pattern/
https://howtocfdtrade.com/2024/06/05/bullish-downside-gap-two-rabbits/
https://howtocfdtrade.com/2024/06/05/bullish-concealing-baby-swallow/
https://howtocfdtrade.com/2024/06/05/bullish-ladder-bottom/
https://howtocfdtrade.com/2024/06/05/the-kicker-pattern/
https://howtocfdtrade.com/2024/06/05/bullish-unique-three-river-bottom/
https://howtocfdtrade.com/2024/06/05/three-stars-in-the-south/
https://howtocfdtrade.com/2024/06/05/bullish-decent-block/
https://howtocfdtrade.com/2024/06/05/bullish-after-bottom-gap-up/
https://howtocfdtrade.com/2024/06/05/bullish-deliberation-block/
https://howtocfdtrade.com/2024/06/05/on-neck-line/
https://howtocfdtrade.com/2024/06/05/bullish-after-bottom-gap-up-2/
https://howtocfdtrade.com/2024/06/05/in-neck-line/
https://howtocfdtrade.com/2024/06/05/thrusting-candlestick-pattern/
https://howtocfdtrade.com/2024/06/05/side-by-side-white-lines-pattern/
RESULT:
**************************************************
URLs and Update Request Types Configured!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************
Successfully Done!
**************************************************

Otherwise error with exact problem specified would show up.

Leave a Reply

Your email address will not be published. Required fields are marked *