Regular Expression with C#: Part 1

In this tutorial you will learn.

  1. Extract Text From The Web pages
  2. Use a special software (Expresso) to make writing Regular Expressions easier
  3. Write and Use Regular Expressions in C#

This tutorial is intended for people who know how to do some basic Regular Expressions and know the syntax of the language. Since there is a lot of material to cover I won’t go over everything.

Till a little while ago I never really thought much of Regular Expressions. I used them a couple of times just to validate an email address or web address etc. It was not till one day I thought of making a web scraping program that would deal with pages from a site and then store all the grabbed information in an xml file.

The need of grabbing the contents came across while studying the RSS technology. In RSS you need to send user a predefined output in XML format with some extra tags. In this tutorial I will not be reaching to that extent, but will give you all an idea about how to just grab the data from the site. Then its upon you, how to utilize the knowledge.

I will take the example of a google search result page, where the results are listed in a well organized manner. We will find out some text from the page for each result and display the result in my own format.

Step 1: Create a web application to host our functionality.

  1. Open Visual Studio 2005
  2. File Menu > New > Web Site
  3. Select ASP.Net Web Site from the listed project templates
  4. Select Language as Visual C# and Click Ok.

You will see the Default.aspx page in the solution explorer, if not add it.

Step 2: Write code to get the desired page source (Google Search Results).

  1. Drag n Drop a TextBox, Button and a Label Control on the Page in design mode.
  2. Rename TextBox1 to txtSearch, Button1 to btnSearch and Label1 to lblResults.
  3. On btnSearch Click event write the following code.
protected void btnGetRating_Click(object sender, EventArgs e)
{
    if (string.IsNullOrEmpty(txtSearch.Text.Trim()))
    {
        txtSearch.Text = "Google";
    }
    string URL = txtSearch.Text;
    string IMDBData = WebPage.getContents(URL);
    string fullRating = getRating(IMDBData.Replace("n", "").Replace("rn", "").Replace("nr", ""));
    lblResults.Text = fullRating;
}

If you see the code, you will understand that I am building a search URL of Google. Here I have separated the code for getting the contents of a page in a separate class named Web Page. This class has a static method called getContents, which takes a string parameter for the web page address to be passed in it. This method returns the page source as string value. Here is the code for this class.

public class WebPage
{
    public static string getContents(string URL)
    {
        string strHTML = "";
        WebRequest objRequest =
        System.Net.HttpWebRequest.Create(URL);
        WebResponse webResponse =
            objRequest.GetResponse();
        using (StreamReader sr =
            new StreamReader(webResponse.GetResponseStream()))
        {
            strHTML = sr.ReadToEnd();
        }
        return strHTML;
    }
}

Till this point you are not able to see any output, but you are able to fetch the source of the web page on which you will be working in next step. In next part we will be studying in brief how to write a regular expression using an free tool called Expresso.

Published by

Anant Anand Gupta

Microsoft Technology Professional

11 thoughts on “Regular Expression with C#: Part 1”

  1. I must say, whilst checking through dozens and dozens of blogs each week, the theme of your blog stands apart (for all the proper reasons). If you don’t mind me asking, what’s the name of this theme or would it be a especially designed affair? It’s far better compared to themes I personally use for some of my blogs

Leave a Reply