Sunday, December 25, 2016

How could we extract desired string's from a text file by regex pattern search

In this article I would like to share a tips to copy a desired block of text's from a large text file. To explain this I am using Notepad++ and you need to basic knowledge about how the regex work.

Say I have a text file like below..

DTP*472*RD8*20150908-20150908~ TRN*2*435577~ STC*A2:20:PR*20160906*WQ*4027.2~ REF*1K*1625000104243~ REF*BLT*131~ DTP*472*RD8*20150609-20150609~ TRN*2*447535~ STC*A2:20:PR*20160906*WQ*436.8~ REF*1K*1625000104244~ REF*BLT*131~ DTP*472*RD8*20150713-20150713~ TRN*2*500331~ STC*A2:20:PR*20160906*WQ*3941.6~ REF*1K*1625000104245~ REF*BLT*131~ DTP*472*RD8*20150812-20150812~ TRN*2*484995~ STC*A2:20:PR*20160906*WQ*436.8~ REF*1K*1625000104246~ REF*BLT*131~ DTP*472*RD8*20150730-20150730~ HL*18*3*PT~ NM1*QC*1*ROSS*RICHARD****MI*334589625A~TRN*2*354771~STC*A7:673*20160906*U*8613.98~STC*A7:509*20160906*U*8613.98~TRN*2*516925~STC*A2:20:PR*20160906*WQ*4204.23~ REF*1K*1625000104247~ REF*BLT*131~ DTP*472*RD8*20150427-20150427~ TRN*2*361769~ STC*A7:460*20160906*U*434.2~ REF*BLT*121~ DTP*472*RD8*20150423-20150423~ TRN*2*268379~ STC*A7:255*20160906*U*25952.66~ REF*BLT*111~ DTP*472*RD8*20150209-20150212~ HL*19*3*PT~ NM1*QC*1*BLACKBURN*LALAH****MI*326388373A~ TRN*2*397463~ STC*A7:673*20160906*U*2400.3~

And I need to extract only those values that are indicated by red color. How we could achieved this?
Yes its time to explain step by step how it's work.

Note: Make sure when you copy any text from here, trim the text first to search in notepad++
  1.  Copy the above test to notepad++
  2. Now press Ctrl + F5 to open search windows.
  3. Now try to find a pattern to search desired string's. (In this case the pattern is TRN*2*xxxxxx*~. Regex search expression for this pattern is (TRN\*2\*)([0-9]+)([~])
  4. Now we need to break down all matched pattern in new line. To do this press Ctrl+H. Then write the regex expression (TRN\*2\*)([0-9]+)([~]) to Find what field and fill \n$1$2$3 to Replace with field. $1$2$3 indicate the consecutive group in above regex expression. (you need to put your own regex expression and group). Make sure you select Regular expression radio button in Search Mode block in replace window. And again make sure your cursor point at the first character of your text.

  5. Now press Replace All button and the text look like below image. 
  6. Now change the regular expression from (TRN\*2\*)([0-9]+)([~]) to (TRN\*2\*)([0-9]+)([~]).* and click Replace All button again.
  7. Now the text look like below image.
  8. Now click on Mark tab from replace window and checked the Bookmark line  checkbox.
  9. Now click Mark All button and your text look like below image.
  10. Now from Search menu go to Bookmark and click on Remove Unmarked Lines. It will remove all unmarked lines and the file looks like.
  11. Now click on Replace tab again and change the value of Replace with field by $2 which mean second group of regular expression.
  12. Now click on Replace All  button.
Here we achieved our desired value from the above text.

Thanks for the patience to go through the whole article. Any feedback or comments will more appreciated

A Deep Dive into Computed Columns in Entity Framework Core

Entity Framework Core (EF Core) is a popular Object-Relational Mapping (ORM) framework that simplifies database access for .NET applications...