[Bash] Text file manipulation

[Bash] Text file manipulation

I’m presenting here some text file manipulation commands with examples of usage. I want to enlarge on cat, tac, uniq, sort, head, tail, tr, wc and cut.
Firstly, let’s create file named test.txt with following content:

1,AAAA,NNN
11,BBB,OOO
21,CCC,PPP
12,DDD,RRR
23,EEE,SSS
14,FFF,TTT
14,FFF,TTT
14,FFF,TTT
4,JJJJ,XXX
6,KKKK,YYY
5,LLLL,ZZZ
7,MMMM,AAA

and test2.txt:

1000,aaa,bbb
1000,aaa,bbb

Secondly, when you write:
command1 | command2 | command 3 – the standard output of command1 is the standard input of command2 and the standard output of command2 is the standard input of command3

wojtekrj@wojtek-laptop:~$ echo "test" | cat
test

command1 && command2 – if command1 is successfully executed, system will execute command2

command1 < infile > outfile – content of infile is standard input of command1 and the output of command1 is saved to outfile

wojtekrj@wojtek-laptop:~$ cat < test.txt > test3.txt && cat test3.txt
1,AAAA,NNN
11,BBB,OOO
21,CCC,PPP
12,DDD,RRR
23,EEE,SSS
14,FFF,TTT
14,FFF,TTT
14,FFF,TTT
4,JJJJ,XXX
6,KKKK,YYY
5,LLLL,ZZZ
7,MMMM,AAA

The file will be used to show working of commands:

cat – without arguments it just copies standard input to standard output (if you use keyboard as standard input, type Ctrl+D when you finish – cat will treat it as EOF (end of file) character). If it has one argument, it will print content of file, which name is the first argument. If it has more arguments, it will print contents of files in order they apeared as arguments.

wojtekrj@wojtek-laptop:~$ cat test.txt
1,AAAA,NNN
11,BBB,OOO
21,CCC,PPP
12,DDD,RRR
23,EEE,SSS
14,FFF,TTT
14,FFF,TTT
14,FFF,TTT
4,JJJJ,XXX
6,KKKK,YYY
5,LLLL,ZZZ
7,MMMM,AAA
wojtekrj@wojtek-laptop:~$ cat test2.txt
1000,aaa,bbb
1000,aaa,bbb
wojtekrj@wojtek-laptop:~$ cat test.txt test2.txt
1,AAAA,NNN
11,BBB,OOO
21,CCC,PPP
12,DDD,RRR
23,EEE,SSS
14,FFF,TTT
14,FFF,TTT
14,FFF,TTT
4,JJJJ,XXX
6,KKKK,YYY
5,LLLL,ZZZ
7,MMMM,AAA
1000,aaa,bbb
1000,aaa,bbb

tac – it works like cat, but prints lines of files in reversed order

wojtekrj@wojtek-laptop:~$ tac test.txt
7,MMMM,AAA
5,LLLL,ZZZ
6,KKKK,YYY
4,JJJJ,XXX
14,FFF,TTT
14,FFF,TTT
14,FFF,TTT
23,EEE,SSS
12,DDD,RRR
21,CCC,PPP
11,BBB,OOO
1,AAAA,NNN
wojtekrj@wojtek-laptop:~$ tac test.txt test2.txt
7,MMMM,AAA
5,LLLL,ZZZ
6,KKKK,YYY
4,JJJJ,XXX
14,FFF,TTT
14,FFF,TTT
14,FFF,TTT
23,EEE,SSS
12,DDD,RRR
21,CCC,PPP
11,BBB,OOO
1,AAAA,NNN
1000,aaa,bbb
1000,aaa,bbb

uniq – it prints content of file without subsequent same lines

wojtekrj@wojtek-laptop:~$ uniq test.txt
1,AAAA,NNN
11,BBB,OOO
21,CCC,PPP
12,DDD,RRR
23,EEE,SSS
14,FFF,TTT
4,JJJJ,XXX
6,KKKK,YYY
5,LLLL,ZZZ
7,MMMM,AAA
wojtekrj@wojtek-laptop:~$ uniq test2.txt
1000,aaa,bbb

sort – it sorts lines of file in lexicographical order. If command is executed with -n option, it will compare numbers at the begining of each line.

wojtekrj@wojtek-laptop:~$ sort test.txt
11,BBB,OOO
12,DDD,RRR
14,FFF,TTT
14,FFF,TTT
14,FFF,TTT
1,AAAA,NNN
21,CCC,PPP
23,EEE,SSS
4,JJJJ,XXX
5,LLLL,ZZZ
6,KKKK,YYY
7,MMMM,AAA
wojtekrj@wojtek-laptop:~$ sort -n test.txt
1,AAAA,NNN
4,JJJJ,XXX
5,LLLL,ZZZ
6,KKKK,YYY
7,MMMM,AAA
11,BBB,OOO
12,DDD,RRR
14,FFF,TTT
14,FFF,TTT
14,FFF,TTT
21,CCC,PPP
23,EEE,SSS

nl – add a number before each line of file

wojtekrj@wojtek-laptop:~$ nl  test.txt
1	1,AAAA,NNN
2	11,BBB,OOO
3	21,CCC,PPP
4	12,DDD,RRR
5	23,EEE,SSS
6	14,FFF,TTT
7	14,FFF,TTT
8	14,FFF,TTT
9	4,JJJJ,XXX
10	6,KKKK,YYY
11	5,LLLL,ZZZ
12	7,MMMM,AAA

head – prints first 10 lines of file. If it is executed with option -N, when N is a number, it will print first N lines of file

wojtekrj@wojtek-laptop:~$ head test.txt
1,AAAA,NNN
11,BBB,OOO
21,CCC,PPP
12,DDD,RRR
23,EEE,SSS
14,FFF,TTT
14,FFF,TTT
14,FFF,TTT
4,JJJJ,XXX
6,KKKK,YYY
wojtekrj@wojtek-laptop:~$ head -3 test.txt
1,AAAA,NNN
11,BBB,OOO
21,CCC,PPP
wojtekrj@wojtek-laptop:~$ head -1 test.txt
1,AAAA,NNN

tail – similar to head, but prints last lines

wojtekrj@wojtek-laptop:~$ tail test.txt
21,CCC,PPP
12,DDD,RRR
23,EEE,SSS
14,FFF,TTT
14,FFF,TTT
14,FFF,TTT
4,JJJJ,XXX
6,KKKK,YYY
5,LLLL,ZZZ
7,MMMM,AAA
wojtekrj@wojtek-laptop:~$ tail -3 test.txt
6,KKKK,YYY
5,LLLL,ZZZ
7,MMMM,AAA
wojtekrj@wojtek-laptop:~$ tail -1 test.txt
7,MMMM,AAA

wc – it counts number of characters, words and lines of file. It can print statistics only for characters (option -c), words (-w) and lines (-l).

wojtekrj@wojtek-laptop:~$ wc test.txt
12  12 132 test.txt
wojtekrj@wojtek-laptop:~$ wc -w test.txt
12 test.txt
wojtekrj@wojtek-laptop:~$ wc -l test.txt
12 test.txt
wojtekrj@wojtek-laptop:~$ wc -c test.txt
132 test.txt

tr – read content from standard input and write to standard output deleting or changing some characters:
tr string1 string2 replaces first character from string1 to first character from string2, second to second, third to third, etc.
tr -d string1 deletes every occurrence of characters from string1
tr -cd string1 deletes every occurence of characters, which are NOT from string1

wojtekrj@wojtek-laptop:~$ cat test2.txt | tr abc ABC
1000,AAA,BBB
1000,AAA,BBB
wojtekrj@wojtek-laptop:~$ cat test2.txt | tr -d abc
1000,,
1000,,
wojtekrj@wojtek-laptop:~$ cat test2.txt | tr -cd abc
aaabbbaaabbb

cut – prints only part of line. Each line is constructed of fields separated by special character called separator.
Let’s consider line:

1,AAAA,NNN

The separator is character “,”, there are 3 fields: first – “1”, second – “AAAA” and third “NNN”. cut has following syntax:
cut -d C -f N text.txt where C is separator and N is number of field to display

wojtekrj@wojtek-laptop:~$ cut -d , -f 1-2 test.txt
1,AAAA
11,BBB
21,CCC
12,DDD
23,EEE
14,FFF
14,FFF
14,FFF
4,JJJJ
6,KKKK
5,LLLL
7,MMMM
wojtekrj@wojtek-laptop:~$ cut -d , -f 1,3 test.txt
1,NNN
11,OOO
21,PPP
12,RRR
23,SSS
14,TTT
14,TTT
14,TTT
4,XXX
6,YYY
5,ZZZ
7,AAA

If you write after -f N-M fields from N to M will be printed. If N,M – N-th and M-th field will be printed.

5 thoughts on “[Bash] Text file manipulation

Leave a Reply

Your email address will not be published. Required fields are marked *