Help - Search - Members - Calendar
Full Version: Using GREP on multiple files
Linuxhelp > Support > Programming in Linux
dmatrix1
I am working with a bunch of Protein Database Files (.pdb) which contain information in the following pattern:

HEADER OXIDOREDUCTASE 26-FEB-12 4DWV
TITLE HORSE ALCOHOL DEHYDROGENASE COMPLEXED WITH NAD+ AND 2,3,4,5,6-
TITLE 2 PENTAFLUOROBENZYL ALCOHOL
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: ALCOHOL DEHYDROGENASE E CHAIN;
COMPND 3 CHAIN: A, B;
COMPND 4 EC: 1.1.1.1
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: EQUUS CABALLUS;
SOURCE 3 ORGANISM_COMMON: DOMESTIC HORSE,EQUINE;
SOURCE 4 ORGANISM_TAXID: 9796;
SOURCE 5 ORGAN: LIVER
KEYWDS ALCOHOL DEHYDROGENASE, NAD+, PENTAFLUOROBENZYL ALCOHOL, MICHAELIS
KEYWDS 2 COMPLEX, ROSSMANN FOLD, OXIDOREDUCTASE
EXPDTA X-RAY DIFFRACTION
AUTHOR B.V.PLAPP,S.RAMASWAMY
REVDAT 3 27-JUN-12 4DWV 1 JRNL
REVDAT 2 16-MAY-12 4DWV 1 JRNL
REVDAT 1 11-APR-12 4DWV 0

What I want to do is GREP out the line with the TITLE, AUTHOR, COMPND, SOURCE, REVDAT. I am using grep ^TITLE to get the title. THE PROBLEM: the ^ will not work for the compound name as it is not listed in the first occurrence. How can I write a script to grep out the second occurrence?
mogwin
It's a little unclear what you are looking for.

If you're trying to use a single grep command to match all of the lines, then you can use "-e" to do that:
grep -e "^TITLE" -e "AUTHOR" -e "^COMPND" -e "^SOURCE" -e "^REVDAT"

If you're trying to get only the second COMPND line, then you can include more of the line in quotes:
grep "^COMPND 2"

If these examples don't match what you had in mind, please clarify your question.

This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2017 Invision Power Services, Inc.