添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
AttributeError: 'list' object has no attribute 'SeqRecord' - Slice multiple sequences with Biopython>SeqIO from fasta file

I am trying to generate varying length N and C termini Slices (1,2,3,4,5,6,7). But before I get there I am having problems just reading in my fasta files. I was following the 'Random subsequences' head tutorial from: https://biopython.org/wiki/SeqIO . But in this case there is only one sequence so maybe that is where I went wrong. The code with example sequences and my errors. Any help would be much appreciated. I am clearly out of my depth. Thanks!

Two example sequences in my file domains.fasta:

TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTLKDEIKTFTVTE TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE

my code that is not working:

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
# Load data:
domains = list(SeqIO.parse("domains.fa",'fasta'))
#set up receiving arrays
home=[]
num=1
#slice data
for i in range(0, 6):
    num = num+1
    domain = domains
    seq_n = domains.seq[0:num]
    seq_c = domains.seq[len(domain)-num:len(domain)]
    name = domains.id
    record_d = SeqRecord(domain,'%s' % (name), '', '')
    home.append(record_d)
    record_n = SeqRecord(seq_n,'%s_n_%i' % (name,num), '', '')
    home.append(record_n)
    record_c = SeqRecord(seq_c,'%s_c_%i' % (name,num), '', '')
    home.append(record_c)
SeqIO.write(home, "domains_variants.fasta", "fasta")

error I get is:

Traceback (most recent call last):
  File "~/fasta_nc_sequences.py", line 20, in <module>
    seq_n = domains.seq[0:num]
AttributeError: 'list' object has no attribute 'SeqRecord'

When I print out 'domains = list(SeqIO.parse("domains.fa",'fasta'))' I get this:

[SeqRecord(seq=Seq('TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTLKDEIKTFTVTE', SingleLetterAlphabet()), id='GA98', name='GA98', description='GA98', dbxrefs=[]), SeqRecord(seq=Seq('TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE', SingleLetterAlphabet()), id='GB98', name='GB98', description='GB98', dbxrefs=[])]

I am not sure why I cannot access what is within the SeqRecord. Maybe it is because I wrapped the SeqIO.parse in a list because before I was being thrown a different error:

AttributeError: 'generator' object has no attribute 'seq'

You have to define which element of the list you want to access, e.g.

seq_n = domains[0].seq[0:num]

By the way:

domain = domains

Why do you copy domains to domain and never use it later?

fin swimmer

I was trying to run slice all the sequences in the list. Do I have to iterate through them in an additional for loop? For some reason I was under the impression that SeqIO.parse() would handle them...

domain I call later in: record_d = SeqRecord(domain,'%s' % (name), '', '') So that I can keep a copy of the complete domains as well as the sliced sequences.

Leiven is correct, you're one level too 'high' in your list.

You can't slice all elements of a list in one go like that (you might be able to hack something with map() and some of the hidden methods for the object, but thats not a good way to go.

You have a few options:

Use SeqIO in a loop:

for record in SeqIO.parse(...):
    for i in range(0,6):
          # do slicing

Use another loop over your list of domains (which is functionally equivalent to the above, but can be done range-sequence instead of sequence-range (which is better, I don't know, but I suspect the former). This will be slower than 1, though probably negligibly so.

Use list comprehensions. This is a bit faster and can lead to more compact code but they aren't the easiest if you're new to python.

Fundamentally however, all of the above are just extra layers of loops. I'd go with option 1 personally.

# Load data:
domains = list(SeqIO.parse("examples/data/domains.fa",'fasta'))
print(domains)
#set up receiving arrays
home=[]
#num=1
#subset data
for record in (domains):
    num = 0
    domain = record.seq
    name = record.id
    record_d = SeqRecord(domain,'%s' % (name), '', '')
    home.append(record_d)
    for i in range(0, 6):
        num= num+1
        seq_n = record.seq[0:num]
        seq_c = record.seq[len(record.seq)-num:len(record.seq)]
        record_n = SeqRecord(seq_n,'%s_n_%i' % (name,num), '', '')
        home.append(record_n)
        record_c = SeqRecord(seq_c,'%s_c_%i' % (name,num), '', '')
        home.append(record_c)
SeqIO.write(home, "domains_variants.fasta", "fasta")