More Fun with Regular Expressions : Word and Paragraph Parsing
Trolling the ASP.NET forums again this morning, I know I do it a lot, I found a question trying to parse the paragraphs out of a series of text. So I knew I had to answer it. The regular expression needed is '(.+)'. This tells the Regular Expression object to match on a series of one or more word related characters. This means it will group matches for a paragraph, indicated by a line or carriage return. Code for this solution would look like this:
public
static
MatchCollection GetParagraphs(){using
(StreamReader sr =new
StreamReader(@"{Path To Sampel File}\SampleText.txt"
)) {string
textFromFile = sr.ReadToEnd(); Regex rg =new
Regex(@"(.+)"
);return
rg.Matches(textFromFile); }}
I thought I would extend this to get a word count as well as all the words. In this case the expression is '(\w+)'.
public
static
MatchCollection GetWords(){using
(StreamReader sr =new
StreamReader(@"{Path To Sampel File}\SampleText.txt"
)) {string
textFromFile = sr.ReadToEnd(); Regex rg =new
Regex(@"(\w+)"
);return
rg.Matches(textFromFile); }}
Calling the RegEx.Matches method returns a MatchCollection, which has a Count property, can be used to get the count of matches. It can also be enumerated through to get that actual matches.
public
static
void
WriteMatchCollectionResults(MatchCollection mc){ Console.WriteLine(mc.Count);foreach
(Match min
mc) { Console.WriteLine(m.Value); } Console.WriteLine("..........................................."
); Console.WriteLine(""
);}