Friday, June 12, 2015

Awk Command in Unix:

Awk tool:

Awk is one of the most powerful tools in Unix used for processing the rows and columns in a file. Awk has built in string functions and associative arrays. Awk supports most of the operators, conditional blocks, and loops available in C language.

One of the good things is that you can convert Awk scripts into Perl scripts using a2p utility.

The basic syntax of AWK:

awk 'BEGIN {start_action} {action} END {stop_action}' filename

cat /etc/passwd | awk -F : '{ if($3 >= '500') print $1;}' | more

Above command will only show normal users, not systems user

Here the actions in the begin block are performed before processing the file and the actions in the end block are performed after processing the file. The rest of the actions are performed while processing the file.

Examples:

Create a file input_file with the following data. This file can be easily created using the output of ls -l.

-rw-r--r-- 1 center center  0 Dec  8 21:39 p1
-rw-r--r-- 1 center center 17 Dec  8 21:15 t1
-rw-r--r-- 1 center center 26 Dec  8 21:38 t2
-rw-r--r-- 1 center center 25 Dec  8 21:38 t3
-rw-r--r-- 1 center center 43 Dec  8 21:39 t4
-rw-r--r-- 1 center center 48 Dec  8 21:39 t5

From the data, you can observe that this file has rows and columns. The rows are separated by a new line character and the columns are separated by a space characters. We will use this file as the input for the examples discussed here. 

1. awk '{print $1}' input_file

Here $1 has a meaning. $1, $2, $3... represents the first, second, third columns... in a row respectively. This awk command will print the first column in each row as shown below.

-rw-r--r--
-rw-r--r--
-rw-r--r--
-rw-r--r--
-rw-r--r--
-rw-r--r--

To print the 4th and 6th columns in a file use awk '{print $4,$5}' input_file

Here the Begin and End blocks are not used in awk. So, the print command will be executed for each row it reads from the file. In the next example we will see how to use the Begin and End blocks.

2. awk 'BEGIN {sum=0} {sum=sum+$5} END {print sum}' input_file

This will prints the sum of the value in the 5th column. In the Begin block the variable sum is assigned with value 0. In the next block the value of 5th column is added to the sum variable. This addition of the 5th column to the sum variable repeats for every row it processed. When all the rows are processed the sum variable will hold the sum of the values in the 5th column. This value is printed in the End block.

3. In this example we will see how to execute the awk script written in a file. Create a file sum_column and paste the below script in that file

#!/usr/bin/awk -f
BEGIN {sum=0} 
{sum=sum+$5} 
END {print sum}

Now execute the the script using awk command as 

awk -f sum_column input_file.

This will run the script in sum_column file and displays the sum of the 5th column in the input_file.

4. awk '{ if($9 == "t4") print $0;}' input_file

This awk command checks for the string "t4" in the 9th column and if it finds a match then it will print the entire line. The output of this awk command is

-rw-r--r-- 1 pcenter pcenter 43 Dec  8 21:39 t4

5. awk 'BEGIN { for(i=1;i<=5;i++) print "square of", i, "is",i*i; }'

This will print the squares of first numbers from 1 to 5. The output of the command is

square of 1 is 1
square of 2 is 4
square of 3 is 9
square of 4 is 16
square of 5 is 25

Notice that the syntax of “if” and “for” are similar to the C language.

Awk Built in Variables: 

You have already seen $0, $1, $2... which prints the entire line, first column, second column... respectively. Now we will see other built in variables with examples. 

FS - Input field separator variable:

So far, we have seen the fields separted by a space character. By default Awk assumes that fields in a file are separted by space characters. If the fields in the file are separted by any other character, we can use the FS variable to tell about the delimiter. 

6. awk 'BEGIN {FS=":"} {print $2}' input_file
OR
awk -F: '{print $2} input_file

This will print the result as 
39 p1
15 t1
38 t2
38 t3
39 t4
39 t5

OFS - Output field separator variable: 

By default whenever we printed the fields using the print statement the fields are displayed with space character as delimiter. For example

7. awk '{print $4,$5}' input_file

The output of this command will be

center 0
center 17
center 26
center 25
center 43
center 48

We can change this default behavior using the OFS variable as

awk 'BEGIN {OFS=":"} {print $4,$5}' input_file

center:0
center:17
center:26
center:25
center:43
center:48

Note: print $4,$5 and print $4$5 will not work the same way. The first one displays the output with space as delimiter. The second one displays the output without any delimiter. 

NF - Number of fileds variable:

The NF can be used to know the number of fields in line

8. awk '{print NF}' input_file
This will display the number of columns in each row.

NR - number of records variable: 
The NR can be used to know the line number or count of lines in a file.

9. awk '{print NR}' input_file
This will display the line numbers from 1.

10. awk 'END {print NR}' input_file
This will display the total number of lines in the file.

Thursday, June 11, 2015

Files and folders with ls

This article aims at deep insight of file listing command in Linux with relevant examples.

$ls
$echo *

above commands will list the files and folders

$echo */

will list only directories

list all the files within a directory including hidden files aka (.) dot files

# ls -a

 list all the files within a directory including hidden files, but do not list implied ‘.’ and ‘..’

#ls -A

Print the content of a directory in long format listing

#ls -l

Example:

drwxr-xr-x  5 muthu linuxinterest      4096 Sep 30 11:31 Binary

Here, drwxr-xr-x is file permission for owner, group and world. Owner has Read(r), Write(w) and Execute(x) permission. The group to which this file belongs has Read(r) and Execute(x) permission but not Write(w) permission, same permission implies for the world that have access to this file.

The Initial ‘d‘ means its a Directory.
Number ‘5‘ represents Symbolic Link.
The File Binary belongs to user muthu and group linuxinterest      .
Sep 30 11:31 represents the date and time it was last modified

print the content of directory in long format listing, showing hidden/dot files

#ls -la

Setting up the alias for 'ls -l' as 'll'

Add alias ll='ls -l'to .bashrc

logout and login then try by issueing the commad ll

listing the name of the files without the name of its owner, when used with switch (-g)

#ls -g

listing the name of files in long listing format without the name of group it belongs, when used with switch (-G) along with switch (-l).

#ls -Gl

Print the size of files and folders in the current directory, in human readable format

#ls -hl or #ls -hs (small h and small s)

if switch (-h) output size in power of 1024, that is standard, What else power values are supported in ls command

There exist a switch -si which is similar to switch -h. The only difference is switch -si uses power of 1000 unlike switch -h which uses the power of 1024.

#ls -si

print the contents of a directory separated with comma

#ls -m

Example:

 muthu, kumar, linux, test

Print the contect in reverse order

#ls -rl

print the sub-directories recursively

#ls -R

sort the files based upon the size

#ls -S

To sort the files based upon size in descending order with the smallest file listed at first and largest at last

#ls -Sr

 List the contents of a directory with no additional information appearing one file per line

acheieved this by using one 1

#ls -l

Example:

muthu
kumar
test
linux

Tuesday, June 9, 2015

Find with -exec option

Difference between {} \; and {} \+ and | xargs

Below 3 commands run and output same result but the first command takes a little time and the format is also little different.

find . -type f -exec file {} \;
find . -type f -exec file {} \+
find . -type f | xargs file

It's because 1st one runs the file command for every file coming from the find command. So, basically it runs as:

file file1.txt
file file2.txt

But latter 2 find with -exec commands run file command once for all files like below:

file file1.txt file2.txt

Following command will move found files to another location.

find . -type f -iname '*.cpp' -exec mv {} ./test/ \;

To find the files in the current directory that contain the text "chrome"

find . -exec grep chrome {} \;

Delete Files Older Than x Days on Linux

find /path/to/files* -mtime +5 -exec rm {} \;

find /path/to/files* -mtime +5 -exec ls -lrt {} \;

cd /var/tmp && find stuff -mtime +90 -exec /bin/rm {} \+

Note that there are spaces between rm, {}, and \;

  • -mtime +60 means you are looking for a file modified 60 days ago.
  • -mtime -60 means less than 60 days.
  • -mtime 60 If you skip + or - it means exactly 60 days.
Display content of file on screen that were last modified 60 days ago, use
$ find /home/you -iname "*.txt" -mtime -60 -exec cat {} \;
Count total number of files using wc command
$ find /home/you -iname "*.txt" -mtime -60 | wc -l
You can also use access time to find out pdf files. Following command will print the list of all pdf file that were accessed in exactly last 60th  day,
$ find /home/you -iname "*.pdf" -mtime 60 -print
Print the file names which contains bills:

find . -name '*bills*' -print

this prints all the files

./may/batch_bills_123.log
./april/batch_bills_456.log

List the files based on the size

ls -lS /path/to/folder/

Capital S.

This will sort files in size

To exclude directories:

ls -lS | grep -v '^d'

List only regular files 

ls -lS | grep '^-'

Filter with specified size:

find . -type f -size +100k | grep '.txt.'

find . -type f -size +100k | grep '.png\|.jpg'

The above could also be rewritten as

find . -type f -size +100k -name "*.png" -o -name "*.jpg"

To list all files over 20MB in the current directory 

find . -size +20M

To get top 10 largest files:

ls -halt |head



Friday, June 5, 2015

grep search options


Description

grep searches the named input FILEs (or standard input if no files are named, or if a single dash ("-") is given as the file name) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.

In addition, three variant programs egrep, fgrep and rgrep are available:

egrep is the same as grep -E.
fgrep is the same as grep -F.
rgrep is the same as grep -r.

Search 2 different words by using grep:

$ egrep -w 'word1|word2' /path/to/file

search more words by using below commands

ls -lrt | grep -ir "fillter\|new\|php"

Options in the grep;

^ – beginning of the line
c – letter c
c – letter c
: – colon
$ – end of line
v, --invert-match Invert the sense of matching, to select non-matching lines
-y The same as -i.