More Fun with Regular Expressions : Word and Paragraph Parsing

Trolling the ASP.NET forums again this morning, I know I do it a lot, I found a question trying to parse the paragraphs out of a series of text. So I knew I had to answer it. The regular expression needed is '(.+)'. This tells the Regular Expression object to match on a series of one or more word related characters. This means it will group matches for a paragraph, indicated by a line or carriage return. Code for this solution would look like this:

public

static

MatchCollection GetParagraphs(){

using

(StreamReader sr =

new

StreamReader(

@"{Path To Sampel File}\SampleText.txt"

)) {

string

textFromFile = sr.ReadToEnd(); Regex rg =

new

Regex(

@"(.+)"

);

return

rg.Matches(textFromFile); }}

I thought I would extend this to get a word count as well as all the words. In this case the expression is '(\w+)'.

public

static

MatchCollection GetWords(){

using

(StreamReader sr =

new

StreamReader(

@"{Path To Sampel File}\SampleText.txt"

)) {

string

textFromFile = sr.ReadToEnd(); Regex rg =

new

Regex(

@"(\w+)"

);

return

rg.Matches(textFromFile); }}

Calling the RegEx.Matches method returns a MatchCollection, which has a Count property, can be used to get the count of matches. It can also be enumerated through to get that actual matches.

public

static

void

WriteMatchCollectionResults(MatchCollection mc){ Console.WriteLine(mc.Count);

foreach

(Match m

in

mc) { Console.WriteLine(m.Value); } Console.WriteLine(

"..........................................."

); Console.WriteLine(

""

);}

Share This Article With Your Friends!

Googles Ads Facebook Pixel Bing Pixel LinkedIn Pixel