Printable Version of Topic

Click here to view this topic in its original format

Linuxhelp _ Programming in Linux _ Using GREP on multiple files

Posted by: dmatrix1 Sep 10 2012, 07:17 PM

I am working with a bunch of Protein Database Files (.pdb) which contain information in the following pattern:

HEADER OXIDOREDUCTASE 26-FEB-12 4DWV
TITLE HORSE ALCOHOL DEHYDROGENASE COMPLEXED WITH NAD+ AND 2,3,4,5,6-
TITLE 2 PENTAFLUOROBENZYL ALCOHOL
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: ALCOHOL DEHYDROGENASE E CHAIN;
COMPND 3 CHAIN: A, B;
COMPND 4 EC: 1.1.1.1
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: EQUUS CABALLUS;
SOURCE 3 ORGANISM_COMMON: DOMESTIC HORSE,EQUINE;
SOURCE 4 ORGANISM_TAXID: 9796;
SOURCE 5 ORGAN: LIVER
KEYWDS ALCOHOL DEHYDROGENASE, NAD+, PENTAFLUOROBENZYL ALCOHOL, MICHAELIS
KEYWDS 2 COMPLEX, ROSSMANN FOLD, OXIDOREDUCTASE
EXPDTA X-RAY DIFFRACTION
AUTHOR B.V.PLAPP,S.RAMASWAMY
REVDAT 3 27-JUN-12 4DWV 1 JRNL
REVDAT 2 16-MAY-12 4DWV 1 JRNL
REVDAT 1 11-APR-12 4DWV 0

What I want to do is GREP out the line with the TITLE, AUTHOR, COMPND, SOURCE, REVDAT. I am using grep ^TITLE to get the title. THE PROBLEM: the ^ will not work for the compound name as it is not listed in the first occurrence. How can I write a script to grep out the second occurrence?

Posted by: mogwin Nov 21 2012, 10:54 AM

It's a little unclear what you are looking for.

If you're trying to use a single grep command to match all of the lines, then you can use "-e" to do that:
grep -e "^TITLE" -e "AUTHOR" -e "^COMPND" -e "^SOURCE" -e "^REVDAT"

If you're trying to get only the second COMPND line, then you can include more of the line in quotes:
grep "^COMPND 2"

If these examples don't match what you had in mind, please clarify your question.


Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)