Funseq
From GersteinInfo
| Line 9: | Line 9: | ||
| 3) [http://sourceforge.net/projects/samtools/files/tabix/ Tabix] <br> | 3) [http://sourceforge.net/projects/samtools/files/tabix/ Tabix] <br> | ||
| 4) [http://vat.gersteinlab.org/index.php VAT] - A good installation guide for VAT can be found [http://ngsda.blogspot.com/2011/06/vat.html here]. <br> | 4) [http://vat.gersteinlab.org/index.php VAT] - A good installation guide for VAT can be found [http://ngsda.blogspot.com/2011/06/vat.html here]. <br> | ||
| - | + | <br> | |
| ==B. PERL Requirement== | ==B. PERL Requirement== | ||
| 1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded [http://www.perl.org/ here]. <br> | 1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded [http://www.perl.org/ here]. <br> | ||
| 2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found [http://search.cpan.org/~szabgab/Parallel-ForkManager-1.03/lib/Parallel/ForkManager.pm here]. | 2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found [http://search.cpan.org/~szabgab/Parallel-ForkManager-1.03/lib/Parallel/ForkManager.pm here]. | ||
| - | + | <br> | |
| ==C. FunSeq tool installation== | ==C. FunSeq tool installation== | ||
| FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following: <br> | FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following: <br> | ||
| Line 22: | Line 22: | ||
| $ make install <br> | $ make install <br> | ||
| <br> | <br> | ||
| - | |||
| - | |||
| ==D. Required Data Files== | ==D. Required Data Files== | ||
| Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': <br><br> | Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': <br><br> | ||
| Line 78: | Line 76: | ||
| 			Purpose : for motif breaking calculation in personal or germ-line genome. <br> | 			Purpose : for motif breaking calculation in personal or germ-line genome. <br> | ||
| 			* Note :  for somatic analysis, these files are not needed. <br> | 			* Note :  for somatic analysis, these files are not needed. <br> | ||
| - | + | <br> | |
| =Running FunSeq= | =Running FunSeq= | ||
Revision as of 20:48, 6 May 2013
| Contents | 
Installation
A. Required Tools
The following tools are REQUIRED for FunSeq: 
1) Bedtools 
2) Samtools 
3) Tabix 
4) VAT - A good installation guide for VAT can be found here. 
B. PERL Requirement
1) Please make sure you have Perl 5 and up. Latest PERL can be downloaded here. 
2) Install package Parallel::ForkManager (this package is used for parallel running). The PERL library can be found here.
C. FunSeq tool installation
FunSeq is a PERL- and Linux/UNIX-based tool. At the command-line prompt, enter the following: 
$ cd FUNSEQ/ 
$ perl Makefile.PL 
$ make 
$ make test 
$ make install 
D. Required Data Files
Please download all the following data files from ' http://funseq.gersteinlab.org/data/ ' and put them in a new folder ' $path/funseq-0.1/data/ ': 
	1.	1kg.phase1.snp.bed.gz   (bed format) 
			Contents : all 1KG phaseI SNVs in bed format. 
			Columns : chromosome , SNVs start position (0-based), SNVs end position, MAF (minor allele frequency) 
			Purpose : to filter out common variants against 1KG SNVs. 
	2.	ENCODE.annotation.gz   (bed format) 
			Contents : compiled annotation files from ENCODE, Gencode v7 and others, includes DHS, TF peak, Pseudogene, ncRNA, enhancers 
			Columns : chromosome , annotation start position (0-based), annotation end position, annotation name. 
			Purpose :  to find SNVs in annotated regions.  
	3.	ENCODE.tf.bound.union.bed  (bed format) 
			Contents : transcription factor (TF) motifs in ENCODE TF peaks.  
			Columns : chromosome, start position (0-based), end position, motif name, , strand, TF name 
			Purpose : used for motif breaking analysis 
	4.	gencode7.cds.bed  (bed format) 
			Contents : extracted CDS information from Gencode7. 
			Columns :  chromosome, start position, end position  
			Purpose : extract SNVs in CDS region 
	5.	gencode.v7.promoter.bed  (bed format) 
			Contents : compiled promoter regions, -2.5kb from transcription start site (TSS) 
			Columns : chromosome, start, end, gene, whether the gene is a hub in protein-protein interaction network (PPI) or regulatory network (REG). 
			Purpose : correlate promoter SNVs with gene 
	6.	gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.interval 
			Purpose : For variant annotation tool (VAT); Gencode v7. 
	7.	gencode.v7.annotation.GRCh37.cds.gtpc.ttpc.fa 
			Purpose : For Variant Annotation Tool (VAT); Gencode v7. 
	8.	DRM_transcript_pairs_modify 
			Contents : distal regulatory module with gene information. 
			Purpose : correlate enhancer SNVs with gene 
	9.	Pouya.motif 
			Contents : PWMs 
			Purpose : used for motif breaking calculation 
	10.	PPI.hubs.txt 
			Purpose : defined hub genes in protein-protein interaction network 
	11.	REG.hubs.txt 
			Purpose : defined hub genes in regulatory network 
	12.	GENE.strong_selection.txt 
			Purpose : genes under strong negative selection, use fraction of rare SNVs among non-synonymous variants. 
	13.	human_ancestor_GRCh37_e59/* 
			Contents : this directory contains human ancestral allele in hg19, Ch37.  
			Purpose : for motif breaking calculation in personal or germ-line genome. 
			* Note :  for somatic analysis, these files are not needed. 
Running FunSeq
Usage : ./funseq -f file -maf maf -m <1/2> -inf <bed/vcf> -outf <bed/vcf>
       Options :
               	-f              user input SNVs file
               	-maf            Minor Allele Frequency (MAF) threshold to filter 1KG phaseI SNVs (value 0 ~ 1)
               	-m              1 - somatic Genome; 2 - germline or personal Genome
               	-inf            input format - BED or VCF
               	-outf           output format - BED or VCF
Default : -maf 0 -m 1 -outf vcf
