Pages

Saturday, August 14, 2010

Get Words in a text using functional way

how to get the word in a text string? The following is the LINQ resolution. Maybe it is scary, but I like this way to think about the problem.

string txt = "abc efg ghi\t abc\n\n123\t efg 345\n";

var result = txt.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries)
                            .SelectMany((str, lineNumber) => str.Select((ch, i) => new { Ch = ch, Index = i })
                                                                .Where(item => Char.IsWhiteSpace(item.Ch) || item.Index==0)
                                                                .Select(item => new { Index = item.Index, CharList = str.Skip(item.Index==0 ? 0 : item.Index + 1).TakeWhile(ch => !Char.IsWhiteSpace(ch)).ToArray() })
                                                                .Select(item => new { Line=lineNumber, Offset = item.Index==0 ? 0 : item.Index+1, Word = new String(item.CharList) })
                                                                .Where(item => !String.IsNullOrWhiteSpace(item.Word)));

the idea is simple:

for each line, we do the following process:


  • index each character
  • find all index whose character is whitespace, this index indicates the start character of a word
  • using this index to take the character till meet another whitespace
This approach  might be the most efficient way to process this problem, but it is a good exercise to use LINQ.

No comments: