Remind students to install Bioconductor.
Remember to download Archaea genome data.
Set up R console:
source("https://bioconductor.org/biocLite.R")
biocLite("ShortRead")
library(ShortRead)
Lists
- Lists are generic vectors that can hold other things.
sites <- c("a", "b", "c")
notes <- "It was a good day in the field today. Warm, sunny, lots of gators."
helpers <- 4
field_notes <- list(sites, notes, helpers)
field_notes[1]
field_notes[[1]]
- We can also give the values names.
field_notes <- list(sites=sites, notes=notes, helpers=helpers)
field_notes$sites
field_notes[["sites"]]
Objects
- Data structures are defined in R as
class()
and stored inobjects()
.vector()
"character"
"integer"
"numeric"
data.frame()
"data.frame"
list()
"list"
- We can also make arbitrary objects that store whatever kinds of
data we need.
- Genome sequences
- Geographical information
- Evolutionary trees
Bioconductor
-
Bioconductor is software for bioinformatics, that includes
ShortRead
for working with genomic data in R. -
We’re using genomic data from Genbank.
- Archaea genome
- Coding regions
.FASTA
- Format stores nucleotide sequences
reads <- readFasta("data/archaea_dna/T-pendens.fasta")
reads
is a special kind of object class,ShortRead
.
reads
- The
str()
ofShortRead
includes other kinds of objects.- Object access:
- using the
@
operator ShortRead
functions
- using the
"DNAStringSet"
holds groups of sequences@sread
sread()
- Object access:
str(reads)
reads@sread
reads@sread[[1]]
reads@sread@ranges
reads@sread@ranges@start
sread(reads)
- Managing and manipulating complex data structures
- Hard to get right in all cases
- Best to rely on existing tools
- Someone has probably already developed a tool for your data structure.
- Other useful
ShortRead
functions
reverse(reads@sread)
complement(reads@sread)
reverseComplement(reads@sread)
alphabetFrequency(reads@sread)
translate(reads@sread)
Assign Exercise 6 - Multiple Files.