添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
从容的椰子  ·  网格布局组 (Grid Layout ...·  3 月前    · 
稳重的佛珠  ·  Configuring the JDBC ...·  8 月前    · 
悲伤的橙子  ·  合并PNG·  1 年前    · 

After training a model on 200 examples, I can't do binary teach, when running this I am getting an error, any ideas what's going wrong? (search doesn't help, not sure exactly what's wrong in my dataset)

prodigy ner.teach ner_st2_skills ./model/model-best ./data.jsonl --label SKILL

Using 1 label(s): SKILL
Traceback (most recent call last):
  File "/Users/fed/.pyenv/versions/3.9.2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/fed/.pyenv/versions/3.9.2/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/prodigy/__main__.py", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/prodigy/recipes/ner.py", line 71, in teach
    model = EntityRecognizer(nlp, label=label)
  File "cython_src/prodigy/models/ner.pyx", line 340, in prodigy.models.ner.EntityRecognizer.__init__
  File "cython_src/prodigy/util.pyx", line 621, in prodigy.util.copy_nlp
  File "spacy/vocab.pyx", line 90, in spacy.vocab.Vocab.vectors.__set__
AttributeError: 'NoneType' object has no attribute 'strings'

My data.jsonl was mapped using PhraseMatch like this (from previously labeled data)

for obj in data:
    doc = nlp(obj['text'])
    matcher = PhraseMatcher(nlp.vocab, attr="LOWER")
    matcher.add("SKILL", [nlp.make_doc(cls['value']) for cls in filterSkillsByConfidence(skills[obj['meta']['listingId']])])
    matches = matcher(doc)
    entities = list()
    for match_id, start, end in matches:
        entities.append(Span(doc, start, end, label='SKILL'))
    doc.ents = spacy.util.filter_spans(entities)
    obj["spans"] = [{"token_start": ent.start,
                    "token_end": ent.end - 1,
                    "start": ent.start_char,
                    "end": ent.end_char,
                    "text": ent.text,
                    "label": ent.label_} for ent in doc.ents]

And whenever trying to:
poetry run python -m prodigy ner.correct ner_st2_skills ./model/model-best ./data.jsonl --label SKILL

The model you're using isn't setting sentence boundaries (e.g. via the parser or sentencizer). This means that incoming examples won't be split into sentences.

============================== Info about spaCy ==============================
spaCy version    3.2.1                         
Location         /Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/spacy
Platform         macOS-11.6.1-x86_64-i386-64bit
Python version   3.9.2                         
Pipelines        en_core_web_md (3.2.0), en_core_web_sm (3.2.0)

I have node ideas how to get the version of the prodigy, but reruning it pip install prodigy -f https://*@download.prodi.gy gives me actually this error: (btw, I installed prodigy using this command like ~3 days ago, should be latest)

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
en-core-web-sm 3.2.0 requires spacy<3.3.0,>=3.2.0, but you have spacy 3.1.4 which is incompatible.
en-core-web-md 3.2.0 requires spacy<3.3.0,>=3.2.0, but you have spacy 3.1.4 which is incompatible.

I installed prodigy first and then spaCy :frowning: that's why I haven't seen this error at first

@SofieVL thanks for pointing this out, I have rebuild env from scratch by installing spaCy and prodigy, by validating that there is no errors and downloading specific models for what spaCy + prodigy can support right now. Looks like error disappeared and binary teach works great now!

Thanks again and happy holidays!

to get the version number.

Anyway, Prodigy up until 1.11.6 was pinning spaCy to <3.2. That explains why rerunning the installation in your old environment throws the error: Prodigy tries to install spaCy 3.1.4. But because you had installed 3.2.1, you had also downloaded spaCy models for 3.2, so pip couldn't easily downgrade spaCy. But as you've found, starting from scratch and letting Prodigy install spaCy from the start, should fix your problems for now.

Ofcourse we want you to be able to benefit from the newest spaCy releases! In fact, we had recently encountered the issue you described originally, and we've already fixed it to make sure Prodigy works well with the latest spaCy version. We're now working on a new release of Prodigy that will be compatible with spaCy 3.2. It will be available soonish :wink:

while prodigy runs tightly connected with spaCy, would it be great to have spaCy supported versions or spaCy validate output here?

Version 1.11.6 Location /Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/prodigy Prodigy Home /Users/fed/.prodigy Platform macOS-11.6.1-x86_64-i386-64bit Python Version 3.9.2 Database Name SQLite Database Id sqlite Total Datasets 6 Total Sessions 28

This is a nice idea! We'd just have to think about how to best implement this, since the spaCy version range is typically only defined in the package requirements on Prodigy (and we wouldn't want to duplicate this configuration so it doesn't go out-of-sync). It's definitely possible to retrieve this info via importlib.metadata, though, so we can try that!