[Bash] Text file manipulation
I’m presenting here some text file manipulation commands with examples of usage. I want to enlarge on cat, tac, uniq, sort, head, tail, tr, wc and cut.
Firstly, let’s create file named test.txt with following content:
1,AAAA,NNN 11,BBB,OOO 21,CCC,PPP 12,DDD,RRR 23,EEE,SSS 14,FFF,TTT 14,FFF,TTT 14,FFF,TTT 4,JJJJ,XXX 6,KKKK,YYY 5,LLLL,ZZZ 7,MMMM,AAA |
and test2.txt:
1000,aaa,bbb 1000,aaa,bbb |
Secondly, when you write:
command1 | command2 | command 3 – the standard output of command1 is the standard input of command2 and the standard output of command2 is the standard input of command3
wojtekrj@wojtek-laptop:~$ echo "test" | cat test |
command1 && command2 – if command1 is successfully executed, system will execute command2
command1 < infile > outfile – content of infile is standard input of command1 and the output of command1 is saved to outfile
wojtekrj@wojtek-laptop:~$ cat < test.txt > test3.txt && cat test3.txt 1,AAAA,NNN 11,BBB,OOO 21,CCC,PPP 12,DDD,RRR 23,EEE,SSS 14,FFF,TTT 14,FFF,TTT 14,FFF,TTT 4,JJJJ,XXX 6,KKKK,YYY 5,LLLL,ZZZ 7,MMMM,AAA |
The file will be used to show working of commands:
cat – without arguments it just copies standard input to standard output (if you use keyboard as standard input, type Ctrl+D when you finish – cat will treat it as EOF (end of file) character). If it has one argument, it will print content of file, which name is the first argument. If it has more arguments, it will print contents of files in order they apeared as arguments.
wojtekrj@wojtek-laptop:~$ cat test.txt 1,AAAA,NNN 11,BBB,OOO 21,CCC,PPP 12,DDD,RRR 23,EEE,SSS 14,FFF,TTT 14,FFF,TTT 14,FFF,TTT 4,JJJJ,XXX 6,KKKK,YYY 5,LLLL,ZZZ 7,MMMM,AAA wojtekrj@wojtek-laptop:~$ cat test2.txt 1000,aaa,bbb 1000,aaa,bbb wojtekrj@wojtek-laptop:~$ cat test.txt test2.txt 1,AAAA,NNN 11,BBB,OOO 21,CCC,PPP 12,DDD,RRR 23,EEE,SSS 14,FFF,TTT 14,FFF,TTT 14,FFF,TTT 4,JJJJ,XXX 6,KKKK,YYY 5,LLLL,ZZZ 7,MMMM,AAA 1000,aaa,bbb 1000,aaa,bbb |
tac – it works like cat, but prints lines of files in reversed order
wojtekrj@wojtek-laptop:~$ tac test.txt 7,MMMM,AAA 5,LLLL,ZZZ 6,KKKK,YYY 4,JJJJ,XXX 14,FFF,TTT 14,FFF,TTT 14,FFF,TTT 23,EEE,SSS 12,DDD,RRR 21,CCC,PPP 11,BBB,OOO 1,AAAA,NNN wojtekrj@wojtek-laptop:~$ tac test.txt test2.txt 7,MMMM,AAA 5,LLLL,ZZZ 6,KKKK,YYY 4,JJJJ,XXX 14,FFF,TTT 14,FFF,TTT 14,FFF,TTT 23,EEE,SSS 12,DDD,RRR 21,CCC,PPP 11,BBB,OOO 1,AAAA,NNN 1000,aaa,bbb 1000,aaa,bbb |
uniq – it prints content of file without subsequent same lines
wojtekrj@wojtek-laptop:~$ uniq test.txt 1,AAAA,NNN 11,BBB,OOO 21,CCC,PPP 12,DDD,RRR 23,EEE,SSS 14,FFF,TTT 4,JJJJ,XXX 6,KKKK,YYY 5,LLLL,ZZZ 7,MMMM,AAA wojtekrj@wojtek-laptop:~$ uniq test2.txt 1000,aaa,bbb |
sort – it sorts lines of file in lexicographical order. If command is executed with -n option, it will compare numbers at the begining of each line.
wojtekrj@wojtek-laptop:~$ sort test.txt 11,BBB,OOO 12,DDD,RRR 14,FFF,TTT 14,FFF,TTT 14,FFF,TTT 1,AAAA,NNN 21,CCC,PPP 23,EEE,SSS 4,JJJJ,XXX 5,LLLL,ZZZ 6,KKKK,YYY 7,MMMM,AAA wojtekrj@wojtek-laptop:~$ sort -n test.txt 1,AAAA,NNN 4,JJJJ,XXX 5,LLLL,ZZZ 6,KKKK,YYY 7,MMMM,AAA 11,BBB,OOO 12,DDD,RRR 14,FFF,TTT 14,FFF,TTT 14,FFF,TTT 21,CCC,PPP 23,EEE,SSS |
nl – add a number before each line of file
wojtekrj@wojtek-laptop:~$ nl test.txt 1 1,AAAA,NNN 2 11,BBB,OOO 3 21,CCC,PPP 4 12,DDD,RRR 5 23,EEE,SSS 6 14,FFF,TTT 7 14,FFF,TTT 8 14,FFF,TTT 9 4,JJJJ,XXX 10 6,KKKK,YYY 11 5,LLLL,ZZZ 12 7,MMMM,AAA |
head – prints first 10 lines of file. If it is executed with option -N, when N is a number, it will print first N lines of file
wojtekrj@wojtek-laptop:~$ head test.txt 1,AAAA,NNN 11,BBB,OOO 21,CCC,PPP 12,DDD,RRR 23,EEE,SSS 14,FFF,TTT 14,FFF,TTT 14,FFF,TTT 4,JJJJ,XXX 6,KKKK,YYY wojtekrj@wojtek-laptop:~$ head -3 test.txt 1,AAAA,NNN 11,BBB,OOO 21,CCC,PPP wojtekrj@wojtek-laptop:~$ head -1 test.txt 1,AAAA,NNN |
tail – similar to head, but prints last lines
wojtekrj@wojtek-laptop:~$ tail test.txt 21,CCC,PPP 12,DDD,RRR 23,EEE,SSS 14,FFF,TTT 14,FFF,TTT 14,FFF,TTT 4,JJJJ,XXX 6,KKKK,YYY 5,LLLL,ZZZ 7,MMMM,AAA wojtekrj@wojtek-laptop:~$ tail -3 test.txt 6,KKKK,YYY 5,LLLL,ZZZ 7,MMMM,AAA wojtekrj@wojtek-laptop:~$ tail -1 test.txt 7,MMMM,AAA |
wc – it counts number of characters, words and lines of file. It can print statistics only for characters (option -c), words (-w) and lines (-l).
wojtekrj@wojtek-laptop:~$ wc test.txt 12 12 132 test.txt wojtekrj@wojtek-laptop:~$ wc -w test.txt 12 test.txt wojtekrj@wojtek-laptop:~$ wc -l test.txt 12 test.txt wojtekrj@wojtek-laptop:~$ wc -c test.txt 132 test.txt |
tr – read content from standard input and write to standard output deleting or changing some characters:
tr string1 string2 replaces first character from string1 to first character from string2, second to second, third to third, etc.
tr -d string1 deletes every occurrence of characters from string1
tr -cd string1 deletes every occurence of characters, which are NOT from string1
wojtekrj@wojtek-laptop:~$ cat test2.txt | tr abc ABC 1000,AAA,BBB 1000,AAA,BBB wojtekrj@wojtek-laptop:~$ cat test2.txt | tr -d abc 1000,, 1000,, wojtekrj@wojtek-laptop:~$ cat test2.txt | tr -cd abc aaabbbaaabbb |
cut – prints only part of line. Each line is constructed of fields separated by special character called separator.
Let’s consider line:
1,AAAA,NNN |
The separator is character “,”, there are 3 fields: first – “1”, second – “AAAA” and third “NNN”. cut has following syntax:
cut -d C -f N text.txt where C is separator and N is number of field to display
wojtekrj@wojtek-laptop:~$ cut -d , -f 1-2 test.txt 1,AAAA 11,BBB 21,CCC 12,DDD 23,EEE 14,FFF 14,FFF 14,FFF 4,JJJJ 6,KKKK 5,LLLL 7,MMMM wojtekrj@wojtek-laptop:~$ cut -d , -f 1,3 test.txt 1,NNN 11,OOO 21,PPP 12,RRR 23,SSS 14,TTT 14,TTT 14,TTT 4,XXX 6,YYY 5,ZZZ 7,AAA |
If you write after -f N-M fields from N to M will be printed. If N,M – N-th and M-th field will be printed.
5 thoughts on “[Bash] Text file manipulation”
hmm. 10x 🙂
I can tell that this is not the first time you mention this topic. Why have you chosen it again?
FANTASTIC!
hmm. funny.
well.. it’s like I thought!