Learning BASH: Text Processing - Cut Command
Text processing tools in Bash is a huge topic . So we will take it one command at a time.
You might think , CUT means to remove a file from location A to location B. But as the link here says, Cut command in unix (or linux) is used to select sections of text from each line of files. You can use the cut command to select fields or columns from a line by specifying a delimiter or you can select a portion of text by specifying the range or characters. Basically the cut command slices a line and extracts the text.
The definition of CUT command in linux itself says:
Print selected parts of lines from each FILE to standard output.
I created a text file (I am on windows running Cygwin...so......) . Added a few lines.
This is the first line
This is the second
And this is not the last line
Finally we end
The linux help says:
N N'th byte, character or field, counted from 1
N- from N'th byte, character or field, to end of line
N-M from N'th to M'th (included) byte, character or field
-M from first to M'th (included) byte, character or field
Problem : Give me the first (1st) letters of every line.
Analysis: -c1 means , column one (1). Or position 1. That's the N'th byte.
Note: Column numbering starts from 1. NOT zero (0).
Problem: Show me the first three characters of each line.
Two ways to do it , May be more, but these are the easiest ways I suppose.
You can specify a RANGE . We have used here -M and N-M in each example.
Problem: Get the 3rd character of each line in a file. The file is given as an input from user.
Note that there other ways to do this.
cutreads from standard input if the argument is "
-" or absent.
The -d option in cut command can be used to specify the delimiter and -f option is used to specify the field position.
Note : The -d needs a delimiter to be specified. The -f tells us the position. Here I have used first to third position .
In the above example, my delimiter is a single length space. I need to see till the 3rd occurance of space.
Problem: Given a sentence, identify and display its fourth word. Assume that the space (' ') is the only delimiter between words.
Same for semi colon example:
Problem: Given a tab delimited file with several columns (tsv format) print the fields from second fields to last field.
Note: The default delimiter is tab. So you DON'T need to specify a delimiter at all if the problem asks for a tab delimiter.
Reference: I took the problems from my favorite code competition site. Hacker Rank. Visit this link to practice more problems . Solve the first 9 problems which are based on CUT command for bash. Best of luck.