User:Wayne Decatur/Sequence analysis tools
From Proteopedia
Have not Categorized Yet
- Plasmapper
- Online Schematic plasmid drawing tool
- Online restriction mapper
- NEBcutter
- old web cutter
- Sequence Manipulation Suite: Restriction Map - has at the left side links to other tools they have
- Biotools at UMASS MED (formerly included EMBOSS
- `cons` alignment consensus program and many others at EMBOSS explorer website
- Links to many EMBOSS portals, servers and mirrors under 'Servers'
- MUSCLE: MUltiple Sequence Comparison by Log-Expectation
- Archaeopteryx for the visualization of annotated phylogenetic trees.
- Netprimer
- Genomicus: Genomes in Evolution - "genome browser that enables users to navigate in genomes in several dimensions: linearly along chromosome axes, transversaly across different species, and chronologically along evolutionary time."
- SeqTrace- "is an application for viewing and processing DNA sequencing chromatograms (trace files). SeqTrace makes it easy to quickly generate high-quality finished sequences from a large number of trace files. SeqTrace can automatically identify, align, and compute [contig] consensus sequences from matching forward and reverse traces, filter low-quality base calls, and perform end trimming of finished sequences. The finished DNA sequences can then be exported to common sequence file formats, such as FASTA. " Written in Python.
- CAP3 Sequence Assembly Program - online, webserver for making contigs from DNA sequences. "form allows you to assemble a set of contiguous sequences (contigs) with the CAP3 program.
- Nucleobytes - DNA editor and 4peaks sequence chromatogram viewer along with other mac software
- PaxDb: Protein Abundance Across Organisms
- PrePPI: database of predicted and experimentally determined protein-protein interactions (PPIs) for yeast and human.
- T-profiler - for scoring the activity Of pre-defined groups of yeast genes using gene expression data **As of May 2016 it was not accepting uploads.**
- g:Profiler - for characterizing and manipulating gene lists of high-throughput genomics. Handles yeast and many other organisms.
- ProViz - a web-based visualization tool to investigate the functional and evolutionary features of protein sequences.
- ProDy Project - "ProDy is a free and open-source Python package for protein structural dynamics analysis". Looks like it does protein sequence analysis too and working with PDB files.
Aligning
- Muscle-binder - Launchable Jupyter environment for running command line-based Muscle via Binder.. That page also links to the main MUSCLE resources there.
BLAST+
- Blast-binder - Launchable Jupyter environment for running command line-based BLAST via Binder.. That page also links to the main BLAST resources there. The launched notebooks illustrate ways to easily work with the output in Python.
Circos
- Circos on Jupyter - Circos in your browser-based Jupyter enviroment served from MyBinder.org. Circos so it is actively available in a browser with one click to launch Jupyter environment for Circos via Binder. That page also links to the main Circos resources there. The launched notebooks illustrate ways to easily work with the output in Python.
Converters
- ALTER (ALignment Transformation EnviRonment) - complex interface but offers lots of options for output. I used it as part of my workflow to get closer to special NEXUS format (or intermediate) for performing maximum likelihood phylogenetic analysis of large sets of sequences.
- Sequence conversion Provided by bugaco.com - a lot of conversion choices with easy interface. When I had interleaved clustal format it converted nicely to a straight fasta listing for the sequence for every organism.
- Reformat utility of Max Planck Institute for Developmental Biology Bioinformatics Toolkit converts sequences or multiple sequence alignments to various forms.
- Format Converter - converts nucleotide and protein sequences in various formats to a lot of other formats.
- Three to One converts three letter amino acid sequence translations to single letter translations.
- One to Three converts single letter amino acid sequence translations to three letter translations.
- ConvertSeq folder at github - my own converter scripts.
- g:Convert - Gene ID Converter. Handles yeast and a very large list of other organisms.
- seqmagick-An imagemagick-like frontend to Biopython SeqIO, can convert from fasta to phylip, etc.
- [Reverse and/or reverse complement DNA sequences that handles degenerate bases ](http://arep.med.harvard.edu/labgc/adnan/projects/Utilities/revcomp.html
Random sequence generators
- http://users-birc.au.dk/biopv/php/fabox/random_sequence_generator.php
- http://www.bioinformatics.org/sms2/random_dna.html
- http://www.faculty.ucr.edu/~mmaduro/random.htm
- http://molbiol.ru/eng/scripts/01_16.html
- also see resources listed at the bottom of my gene expression page at my simulated_data repo.
Sequence shufflers
- http://emboss.sourceforge.net/ - shuffleseq from EMBOSS shuffles a set of sequences maintaining composition.
Extract physico-chemical data from Protein or DNA sequences
- Seq2Feature webserver is a comprehensive web-based feature extraction tool which computes protein and DNA sequence driven features. It can calculate 252 protein- based and 42 DNA- based descriptors. Major protein sequence based descriptors include physico-chemical, energetic and conformational properties, mutation matrices and contact potentials. There is a corresponding article here.
Orthology
- EggNOG - A database of orthologous groups and functional annotation
- HH-suite3 for sensitive protein sequence searching based on HMM-HMM alignment
Pattern Matching
- patmatch-binder- Launchable Jupyter environment for running command line-based PatMatch via Binder. That page also links to other sequence pattern matching resources. The launched notebooks illustrate ways to easily work with the output in Python.
Infernal builds consensus RNA secondary structure profiles called covariance models (CMs), and uses them to search nucleic acid sequence databases for homologous RNAs, or to create new sequence- and structure-based multiple sequence alignments.
Some sequence analysis but mostly OTHER
- BioCyc Database Collection - "BioCyc is a collection of 3530 Pathway/Genome Databases (PGDBs), with tools for understanding their data. Cellular Overview image generated by Pathway Tools. Explore Metabolic Maps for Thousands of Organisms. RouteSearch: Search for Paths through the Metabolic Network. Cross-Organism Search form generated by Pathway Tools. New: Search All of BioCyc for Genes, Proteins, Pathways. Search all of BioCyc or designated taxonomic groups for named genes, proteins, metabolites, pathways. Multiple Sequence Alignment results generated by Pathway Tools using MUSCLE. PatMatch query and results by Pathway Tools. SmartTable display generated by Pathway Tools. Metabolomics Data Analysis. Cellular Overview Omics Viewer image generated by Pathway Tools. Gene Expression Data Analysis. Multi-Genome Browser. Comparative Genome Analysis."
Good E. coli database
- - EcoProDB E. coli protein database (EcoProDB) integrates protein information identified on 2-D gels along with other resources to provide the comparative platform for the expression levels of many heterogeneous proteins under different genetic and environmental conditions using the interactive interface and search mechanism.
NGS
- HOMER - "Software for motif discovery and next-gen sequencing analysis". Nice in that it actually explains some of the details and advantages of the browsers and file types.
Nucleic acid system building and DNA structure design
- NUPACK - "NUPACK is a growing software suite for the analysis and design of nucleic acid structures, devices, and systems." Seems to be able to do melting temperature and free energy calculations as well, etc..
Fungal Genome Resources
1011 Saccharomyces cerevisiae genomes , associated with Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Peter J, De Chiara M, Friedrich A, Yue JX, Pflieger D, Bergström A, Sigwalt A, Barre B, Freel K, Llored A, Cruaud C, Labadie K, Aury JM, Istace B, Lebrigand K, Barbry P, Engelen S, Lemainque A, Wincker P, Liti G, Schacherer J. Nature. 2018 Apr;556(7701):339-344. doi: 10.1038/s41586-018-0030-5. Epub 2018 Apr 11. PMID: 29643504.
332 budding yeasts associated with Tempo and Mode of Genome Evolution in the Budding Yeast Subphylum. Shen XX, Opulente DA, Kominek J, Zhou X, Steenwyk JL, Buh KV, Haase MAB, Wisecaver JH, Wang M, Doering DT, Boudouris JT, Schneider RM, Langdon QK, Ohkuma M, Endoh R, Takashima M, Manabe RI, Čadež N, Libkind D, Rosa CA, DeVirgilio J, Hulfachor AB, Groenewald M, Kurtzman CP, Hittinger CT, Rokas A. Cell. 2018 Nov 29;175(6):1533-1545.e20. doi: 10.1016/j.cell.2018.10.023. Epub 2018 Nov 8. PMID: 30415838. (Figshare corresponding to the paper)
http://1000.fungalgenomes.org/home/
http://fungidb.org/fungidb/ (about it –> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245123/)
http://genome.jgi.doe.gov/programs/fungi/1000fungalgenomes.jsf <— nice graphic of situation related to 1000 fungal genomes project
http://genome.jgi-psf.org/programs/fungi/index.jsf
http://fungi.ensembl.org/index.html
http://en.wikipedia.org/wiki/List_of_sequenced_fungi_genomes <– how current is it???
For genomic arrangement (synteny) comparisons/Fungal Genomics Resources
Synteny Viewer listed under every SGD gene on Sequence tab, near bottom of page
http://www.genomicus.biologie.ens.fr/genomicus-fungi-19.01/cgi-bin/search.pl
Yeast Gene Order Browser (YGOB)
RNA Structure Analysis
- Infernal - A downloadable program fors equence analysis using profiles of RNA sequence based on Rfam-associated covariance models and secondary structure consensus. The program can generate covariance models from RNA alignments as well. Binaries are available for Mac, Windows, and Linux. ( E. P. Nawrocki and S. R. Eddy, Infernal 1.1: 100-fold faster RNA homology searches , Bioinformatics 29:2933-2935 (2013). PMID: 24008419)
- rna-tools - (previously known as ' rna-pdb-tools'): a toolbox to analyze sequences, structures and simulations of RNA. (Takes some navigating around to find what you want because a lot is there.)
Analyze DNA curvature
- bendit-binder - use the Bend.it software to predict DNA curvature from DNA sequences with the power of the Jupyter ecosystem served via MyBinder.org.
Sequence Logo Generation
Installable software for fine-tuning sequence alignments
- SEQOTRON - Mac Software for adjusting sequence alignments by hand. Unfortunately it discards the conservation data if it is there in input. Haven't found a way to put it back in the output other than use `cons` alignment consensus program and many others at EMBOSS explorer website
Windows equivalent is here but I have NOT tried it.
Python-based utilities
- seqmagick-An imagemagick-like frontend to Biopython SeqIO. For example, it can convert from fasta to phylip, remove gaps from a fasta-formatted sequence, and describe all FASTA files in the current directory. Requires Biopython.
- see also earlier on this page 'Binder'/notebook-related items as I usually have worked out Python code to shuttle other command-line based software output to Python and notebook-related items here as I sometimes demonstrate script usage in launchable notebooks
My own sequence work-related code
- Sequence manipulation Python code
- Working with UGENE software analysis software
- Working with Yeastmine
- see also earlier on this page 'Binder'/notebook-related items as I usually have worked out Python code to shuttle other command-line based software output to Python and notebook-related items here as I sometimes demonstrate script usage in launchable notebooks
- see also My Github