Funseq
From GersteinInfo
|  (→Installation) |  (→Installation) | ||
| Line 15: | Line 15: | ||
| ==C. FunSeq tool installation== | ==C. FunSeq tool installation== | ||
| - | FunSeq is a PERL- and Linux-based tool. At the command-line prompt, enter the following: <br> | + | FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following: <br> | 
| $ cd FUNSEQ/ <br> | $ cd FUNSEQ/ <br> | ||
| $ perl Makefile.PL <br> | $ perl Makefile.PL <br> | ||
| Line 23: | Line 23: | ||
| <br> | <br> | ||
| <br> | <br> | ||
| + | |||
| + | ==D. Required Data Files= | ||
| + | Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': <br> | ||
| + | 	1.	1kg.phase1.snp.bed.gz   (bed format) <br> | ||
| + | 			Contents : all 1KG phaseI SNVs in bed format. <br> | ||
| + | 			Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency) <br> | ||
| + | 			Purpose : to filter out common variants against 1KG SNVs. <br><br> | ||
| + | |||
| + | 	2.	ENCODE.annotation.gz   (bed format) <br> | ||
| + | 			Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers <br> | ||
| + | 			Columns : chromosome , annotation start position (0-based), annotation end position, annotation name. <br> | ||
| + | 			Purpose :  to find SNVs in annotated regions.  <br><br> | ||
| + | |||
| + | 	3.	ENCODE.tf.bound.union.bed  (bed format) <br> | ||
| + | 			Contents : transcription factor (TF) motifs in ENCODE TF peaks.  <br> | ||
| + | 			Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name <br> | ||
| + | 			Purpose : used for motif breaking analysis <br><br> | ||
| + | |||
| + | 	4.	gencode7.cds.bed  (bed format) <br> | ||
| + | 			Contents : extracted CDS information from Gencode7. <br> | ||
| + | 			Columns :  chromosome, start position, end position  <br> | ||
| + | 			Purpose : extract SNVs in CDS region <br><br> | ||
| + | |||
| + | 	5.	gencode.v7.promoter.bed  (bed format) <br> | ||
| + | 			Contents : compiled promoter regions, -2.5kb from transcription start site (TSS) <br> | ||
| + | 			Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG). <br> | ||
| + | 			Purpose : correlate promoter SNVs with gene <br><br> | ||
| + | |||
| + | 	6.	gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval <br> | ||
| + | 			Purpose : For variant annotation tool (VAT); Gencode v7. <br><br> | ||
| + | |||
| + | 	7.	gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa <br> | ||
| + | 			Purpose : For Variant Annotation Tool (VAT); Gencode v7. <br><br> | ||
| + | |||
| + | 	8.	DRM_transcript_pairs_modify <br> | ||
| + | 			Contents : distal regulatory module with gene information. <br> | ||
| + | 			Purpose : correlate enhancer SNVs with gene <br><br> | ||
| + | |||
| + | 	9.	Pouya.motif <br> | ||
| + | 			Contents : PWMs <br> | ||
| + | 			Purpose : used for motif breaking calculation <br><br> | ||
| + | |||
| + | 	10.	PPI.hubs.txt <br> | ||
| + | 			Purpose : defined hub genes in protein-protein interaction network <br><br> | ||
| + | |||
| + | 	11.	REG.hubs.txt <br> | ||
| + | 			Purpose : defined hub genes in regulatory network <br><br> | ||
| + | |||
| + | 	12.	GENE.strong_selection.txt <br> | ||
| + | 			Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants. <br><br> | ||
| + | |||
| + | 	13.	human_ancestor_GRCh37_e59/* <br> | ||
| + | 			Contents : this directory contains human ancestral allele in hg19, Ch37.  <br> | ||
| + | 			Purpose : for motif breaking calculation in personal or germ-line genome. <br> | ||
| + | 			* Note :  for somatic analysis, these files are not needed. <br> | ||
| + | |||
| + | =Running FunSeq= | ||
| + | |||
| + | Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf> | ||
| + |         Options : | ||
| + |                 	-f              user input SNVs file | ||
| + |                 	-maf            Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1) | ||
| + |                 	-m              1 - somatic Genome; 2 - germline or personal Genome | ||
| + |                 	-inf            input format - BED or VCF | ||
| + |                 	-outf           output format - BED or VCF | ||
| + | |||
| + |         Default : -maf 0 -m 1 -outf vcf | ||
Revision as of 20:44, 6 May 2013
| Contents | 
Installation
A. Required Tools
The following tools are REQUIRED for FunSeq: 
1) Bedtools 
2) Samtools 
3) Tabix 
4) VAT - A good installation guide for VAT can be found here. 
B. PERL Requirement
1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded here. 
2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found here.
C. FunSeq tool installation
FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following: 
$ cd FUNSEQ/ 
$ perl Makefile.PL 
$ make 
$ make test 
$ make install 
=D. Required Data Files
Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': 
	1.	1kg.phase1.snp.bed.gz   (bed format) 
			Contents : all 1KG phaseI SNVs in bed format. 
			Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency) 
			Purpose : to filter out common variants against 1KG SNVs. 
	2.	ENCODE.annotation.gz   (bed format) 
			Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers 
			Columns : chromosome , annotation start position (0-based), annotation end position, annotation name. 
			Purpose :  to find SNVs in annotated regions.  
	3.	ENCODE.tf.bound.union.bed  (bed format) 
			Contents : transcription factor (TF) motifs in ENCODE TF peaks.  
			Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name 
			Purpose : used for motif breaking analysis 
	4.	gencode7.cds.bed  (bed format) 
			Contents : extracted CDS information from Gencode7. 
			Columns :  chromosome, start position, end position  
			Purpose : extract SNVs in CDS region 
	5.	gencode.v7.promoter.bed  (bed format) 
			Contents : compiled promoter regions, -2.5kb from transcription start site (TSS) 
			Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG). 
			Purpose : correlate promoter SNVs with gene 
	6.	gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval 
			Purpose : For variant annotation tool (VAT); Gencode v7. 
	7.	gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa 
			Purpose : For Variant Annotation Tool (VAT); Gencode v7. 
	8.	DRM_transcript_pairs_modify 
			Contents : distal regulatory module with gene information. 
			Purpose : correlate enhancer SNVs with gene 
	9.	Pouya.motif 
			Contents : PWMs 
			Purpose : used for motif breaking calculation 
	10.	PPI.hubs.txt 
			Purpose : defined hub genes in protein-protein interaction network 
	11.	REG.hubs.txt 
			Purpose : defined hub genes in regulatory network 
	12.	GENE.strong_selection.txt 
			Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants. 
	13.	human_ancestor_GRCh37_e59/* 
			Contents : this directory contains human ancestral allele in hg19, Ch37.  
			Purpose : for motif breaking calculation in personal or germ-line genome. 
			* Note :  for somatic analysis, these files are not needed. 
Running FunSeq
Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf>
       Options :
               	-f              user input SNVs file
               	-maf            Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1)
               	-m              1 - somatic Genome; 2 - germline or personal Genome
               	-inf            input format - BED or VCF
               	-outf           output format - BED or VCF
Default : -maf 0 -m 1 -outf vcf
