How to make a sitemap generator in Visual Basic .NET with source code

Tags: VB.NET, VB 2008, VB 2010, VB 2012, VB 2013

This tutorial will show you how to make a sitemap generator in Visual Basic .NET

Source code is available at the end of this tutorial

The way I developed this sitemap generator is by crawling a website, and indexing each link found.

This is a basic sitemap generator. There are probably better and faster ways to generate sitemaps. But this is what I came up with to generate sitemaps to this site, and more sites I have, It has been working the way I need it. Please report any bugs in the comments section below.

Step by Step

Add controls to the page as shown in the picture below

vb.net sitemap generator

The textbox that will hold the page address is named to Url

The Scan Button is named ScanButton

The Export Button is named ExportButton

Declarations

Declaring variables to hold some values:

    Dim sourceString As String 'Holds source code of the page being scanned for links
    Dim wb As New WebBrowser 'WebBrowser control to read all links in a page
    Dim shortUrl As String 'short link of a site without (http:// or www). Used to tell if the link refers to an external or an internal link
    Dim links As New List(Of String) 'List of each link found
    Dim i As Integer = 0 'Current Index of Each Link being processed

Load Event

Add the following form load event code:

 Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
        AddHandler wb.DocumentCompleted, New WebBrowserDocumentCompletedEventHandler(AddressOf wb_DocumentCompleted)
        wb.ScriptErrorsSuppressed = True
    End Sub

Scan Button

Add the following scan button code:

Private Sub ScantButton_Click(sender As System.Object, e As System.EventArgs) Handles ScanButton.Click
        If Url.Text = "" Then
            MsgBox("Enter a url to scan")
        Else
            links.Clear()
            ScanButton.Enabled = False
            Url.Enabled = False
            If Url.Text.Substring(Url.Text.Length - 1, 1) = "/" Then
                Url.Text = Url.Text.Substring(0, Url.Text.Length - 1)
            End If
            shortUrl = Url.Text.Replace("http://", "")
            shortUrl = shortUrl.Replace("www.", "")
            wb.Navigate(Url.Text)
        End If

    End Sub

First step of the above code is when a user click on Scan Button, the program checks if there is text in the url address, then proceeds.
Second step, it clears the list of links if it has any.
Then it disables the buttons and url textbox until scanning and indexing is done.
After that the program will remove any forward slash at the end of the url, then it creates the short url that doesn't include any http:// or www

WebBrowser Navigating

The following code is the WebBrowser document completed event handler. Each time a link is navigated, this event will tell the program to look for links in that page.

Private Sub wb_DocumentCompleted(sender As System.Object, e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs)
        Try
            Label1.Text = "Now Processing " & i + 1 & "/" & links.Count & ": " & links(i)
            Label1.Refresh()
        Catch ex As Exception
        End Try
        GetLinks()
End Sub

Scanning the Page

The following code will scan the page navigated and returns all the links found. The internal links will be indexed, and the external ones will be ignored. Also the following code does not index images found

Private Sub GetLinks()
        For Each ClientControl As HtmlElement In wb.Document.Links
            If links.Contains(ClientControl.GetAttribute("href")) Then
            ElseIf ClientControl.GetAttribute("href").Contains("#") Then
            ElseIf ClientControl.GetAttribute("href").ToLower.Contains(".jpg") Then
            ElseIf ClientControl.GetAttribute("href").ToLower.Contains(".png") Then
            ElseIf ClientControl.GetAttribute("href").ToLower.Contains(".jpeg") Then
            ElseIf ClientControl.GetAttribute("href").ToLower.Contains(".wmv") Then
            'set as many pages extensions as needed to ignore. There are links to mp3, mp4, rar, zip that you should also consider.
            Else
                If ClientControl.GetAttribute("href").Contains(shortUrl) Then
                    links.Add(ClientControl.GetAttribute("href"))
                End If
            End If
        Next
        If links.Count > 0 Then
            NextLink()
        End If
    End Sub

Go to the Next Page in the Website

After done scanning a webpage and storing all the links found, the program will go to the next link from the list stored to navigate,

Private Sub NextLink()
        i += 1
        If links.Count > 0 And links.Count <> i Then
            Try
                sourceString = New System.Net.WebClient().DownloadString(links(i))
                'break scripts and images tags speeds up the process
                sourceString = sourceString.Replace("<script", "")
                sourceString = sourceString.Replace("</script>", "")
                sourceString = sourceString.Replace("<img", "")
                sourceString = sourceString.Replace("<link ", "")
                sourceString = sourceString.Replace("<ins ", "")
                sourceString = sourceString.Replace("(adsbygoogle ", "")
                wb.DocumentText = sourceString

            Catch ex As Exception

                If (ex.Message.Contains("The remote server returned an error")) Then
                    links.RemoveAt(i)
                End If
                NextLink()
            End Try
        Else
            MsgBox("Done")
            ExportButton.Enabled = True
            ScanButton.Enabled = True
            Url.Enabled = True
            Url.Clear()
        End If
    End Sub

Exporting the sitemap.xml

The followind code is the export button click event

Private Sub ExportButton_Click(sender As System.Object, e As System.EventArgs) Handles ExportButton.Click

        Dim FileLocaion As String = "C:\Users\Jimmy\Desktop\sitemap.xml" 'change to your path

        My.Computer.FileSystem.WriteAllText(FileLocaion, "<?xml version=""1.0"" encoding=""UTF-8""?>" & Environment.NewLine & "<urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">" & Environment.NewLine, False)
        For Each item In links
            My.Computer.FileSystem.WriteAllText(FileLocaion, "<url>" & Environment.NewLine, True)
            My.Computer.FileSystem.WriteAllText(FileLocaion, "<loc>" & item.ToString & "</loc>" & Environment.NewLine, True)
            If CheckBox1.CheckState = CheckState.Checked Then
                My.Computer.FileSystem.WriteAllText(FileLocaion, "<changefreq>" & ComboBox1.Text & "</changefreq>" & Environment.NewLine, True)
            End If
            If CheckBox2.CheckState = CheckState.Checked Then
                My.Computer.FileSystem.WriteAllText(FileLocaion, "<priority>" & ComboBox2.Text & "</priority>" & Environment.NewLine, True)
            End If
            My.Computer.FileSystem.WriteAllText(FileLocaion, "</url>" & Environment.NewLine, True)
        Next
        My.Computer.FileSystem.WriteAllText(FileLocaion, "</urlset>", True)
    End Sub

The above code also checks whether or not to add the frequency and the priority parameters to the sitemap. Since they are optional, the user has the option to add them or ignore them.

Enabling Frequency Option

The followind code is the frequency option checkbox code

Private Sub CheckBox1_CheckedChanged(sender As System.Object, e As System.EventArgs) Handles CheckBox1.CheckedChanged
        If CheckBox1.CheckState = CheckState.Checked Then
            ComboBox1.Enabled = True
        Else
            ComboBox1.Enabled = False
        End If
End Sub

Enabling Priority Option

The followind code is the priority option checkbox code

  Private Sub CheckBox2_CheckedChanged(sender As System.Object, e As System.EventArgs) Handles CheckBox2.CheckedChanged
        If CheckBox2.CheckState = CheckState.Checked Then
            ComboBox2.Enabled = True
        Else
            ComboBox2.Enabled = False
        End If
    End Sub

Please report all bugs in the comments section below. Also below is the source code for this project. Feel free to modify and use it.

Download Sitemap generator source code


Share This

Home | About | Contact | Privacy Policy

Copyright visual-basic-tutorials.com 2017 - All Rights Reserved.