Regular Expression with C#: Part 1

In this tutorial you will learn.

  1. Extract Text From The Web pages
  2. Use a special software (Expresso) to make writing Regular Expressions easier
  3. Write and Use Regular Expressions in C#

This tutorial is intended for people who know how to do some basic Regular Expressions and know the syntax of the language. Since there is a lot of material to cover I won’t go over everything.

Till a little while ago I never really thought much of Regular Expressions. I used them a couple of times just to validate an email address or web address etc. It was not till one day I thought of making a web scraping program that would deal with pages from a site and then store all the grabbed information in an xml file.

The need of grabbing the contents came across while studying the RSS technology. In RSS you need to send user a predefined output in XML format with some extra tags. In this tutorial I will not be reaching to that extent, but will give you all an idea about how to just grab the data from the site. Then its upon you, how to utilize the knowledge.

I will take the example of a google search result page, where the results are listed in a well organized manner. We will find out some text from the page for each result and display the result in my own format.

Step 1: Create a web application to host our functionality.

  1. Open Visual Studio 2005
  2. File Menu > New > Web Site
  3. Select ASP.Net Web Site from the listed project templates
  4. Select Language as Visual C# and Click Ok.

You will see the Default.aspx page in the solution explorer, if not add it.

Step 2: Write code to get the desired page source (Google Search Results).

  1. Drag n Drop a TextBox, Button and a Label Control on the Page in design mode.
  2. Rename TextBox1 to txtSearch, Button1 to btnSearch and Label1 to lblResults.
  3. On btnSearch Click event write the following code.
protected void btnGetRating_Click(object sender, EventArgs e)
{
    if (string.IsNullOrEmpty(txtSearch.Text.Trim()))
    {
        txtSearch.Text = "Google";
    }
    string URL = txtSearch.Text;
    string IMDBData = WebPage.getContents(URL);
    string fullRating = getRating(IMDBData.Replace("n", "").Replace("rn", "").Replace("nr", ""));
    lblResults.Text = fullRating;
}

Continue reading Regular Expression with C#: Part 1