Showing posts with label Linux. Show all posts
Showing posts with label Linux. Show all posts

Learning BASH: Text Processing - Cut Command


Text processing tools in Bash is a huge topic . So we will take it one command at a time.


CUT COMMAND


You might think , CUT means to remove a file from location A to location B. But as the link here says, Cut command in unix (or linux) is used to select sections of text from each line of files. You can use the cut command to select fields or columns from a line by specifying a delimiter or you can select a portion of text by specifying the range or characters. Basically the cut command slices a line and extracts the text.

The definition of CUT command in linux itself says:



Print selected parts of lines from each FILE to standard output.

I created a text file (I am on windows running Cygwin...so......) . Added a few lines.

This is the first line
This is the second
And this is not the last line
Finally we end
Good Bye

The linux help says:


 N         N'th byte, character or field, counted from 1
  N-       from N'th byte, character or field, to end of line
  N-M   from N'th to M'th (included) byte, character or field
  -M      from first to M'th (included) byte, character or field


Problem : Give me the first (1st) letters of every line.

Solution:


$ cut -c1 foo.txt
T
T
A
F
G

Analysis: -c1 means , column one (1). Or position 1. That's the N'th byte.
Note: Column numbering starts from 1. NOT zero (0).

Problem: Show me the first three characters of each line.
Soln:


$ cut -c1-3 foo.txt
Thi
Thi
And
Fin
Goo

$ cut -c-3 foo.txt
Thi
Thi
And
Fin
Goo

Two ways to do it , May be more, but these are the easiest ways I suppose.
You can specify a RANGE . We have used here -M and N-M in each example.

Problem: Get the 3rd character of each line in a file. The file is given as an input from user. 

Solution:

cut -c3 $(expr read file)

Note that there other ways to do this.

cut -c3 

Note:  cut reads from standard input if the argument is "-" or absent.


Using Delimiters



The -d option in cut command can be used to specify the delimiter and -f option is used to specify the field position.

$ cut -d$' ' -f-3 foo.txt
This is the
This is the
And this is
Finally we end
Good Bye

Note : The -d needs a delimiter to be specified. The -f tells us the position. Here I have used first to third position .

In the above example, my delimiter is a single length space. I need to see till the 3rd occurance of space.

Another example:

$ cat foo.txt
Hi:I:Am:Groot

$ cut -d$':' -f1-3 foo.txt
Hi:I:Am

Problem: Given a sentence, identify and display its fourth word. Assume that the space (' ') is the only delimiter between words.

Solution:


cut -d$' ' -f4

Same for semi colon example:


$ cat foo.txt
Hi:I:Am:Groot

$ cut -d$':' -f4 foo.txt
Groot


Problem: Given a tab delimited file with several columns (tsv format) print the fields from second fields to last field.

Solution:

cut -f2- 

Note: The default delimiter is tab. So you DON'T need to specify a delimiter at all if the problem asks for a tab delimiter.

__________________________________________________________________________________________________

Reference: I took the problems from my favorite code competition site. Hacker Rank. Visit this link to practice more problems . Solve the first 9 problems which are based on CUT command for bash. Best of luck.

Total Pageviews