After training a model on 200 examples, I can't do binary teach, when running this I am getting an error, any ideas what's going wrong? (search doesn't help, not sure exactly what's wrong in my dataset)
prodigy ner.teach ner_st2_skills ./model/model-best ./data.jsonl --label SKILL
Using 1 label(s): SKILL
Traceback (most recent call last):
File "/Users/fed/.pyenv/versions/3.9.2/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/fed/.pyenv/versions/3.9.2/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/prodigy/__main__.py", line 61, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src/prodigy/core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/prodigy/recipes/ner.py", line 71, in teach
model = EntityRecognizer(nlp, label=label)
File "cython_src/prodigy/models/ner.pyx", line 340, in prodigy.models.ner.EntityRecognizer.__init__
File "cython_src/prodigy/util.pyx", line 621, in prodigy.util.copy_nlp
File "spacy/vocab.pyx", line 90, in spacy.vocab.Vocab.vectors.__set__
AttributeError: 'NoneType' object has no attribute 'strings'
My data.jsonl was mapped using PhraseMatch like this (from previously labeled data)
for obj in data:
doc = nlp(obj['text'])
matcher = PhraseMatcher(nlp.vocab, attr="LOWER")
matcher.add("SKILL", [nlp.make_doc(cls['value']) for cls in filterSkillsByConfidence(skills[obj['meta']['listingId']])])
matches = matcher(doc)
entities = list()
for match_id, start, end in matches:
entities.append(Span(doc, start, end, label='SKILL'))
doc.ents = spacy.util.filter_spans(entities)
obj["spans"] = [{"token_start": ent.start,
"token_end": ent.end - 1,
"start": ent.start_char,
"end": ent.end_char,
"text": ent.text,
"label": ent.label_} for ent in doc.ents]
And whenever trying to:
poetry run python -m prodigy ner.correct ner_st2_skills ./model/model-best ./data.jsonl --label SKILL
The model you're using isn't setting sentence boundaries (e.g. via the parser or sentencizer). This means that incoming examples won't be split into sentences.
============================== Info about spaCy ==============================
spaCy version 3.2.1
Location /Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/spacy
Platform macOS-11.6.1-x86_64-i386-64bit
Python version 3.9.2
Pipelines en_core_web_md (3.2.0), en_core_web_sm (3.2.0)
I have node ideas how to get the version of the prodigy, but reruning it pip install prodigy -f https://*@download.prodi.gy
gives me actually this error: (btw, I installed prodigy using this command like ~3 days ago, should be latest)
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
en-core-web-sm 3.2.0 requires spacy<3.3.0,>=3.2.0, but you have spacy 3.1.4 which is incompatible.
en-core-web-md 3.2.0 requires spacy<3.3.0,>=3.2.0, but you have spacy 3.1.4 which is incompatible.
I installed prodigy first and then spaCy
that's why I haven't seen this error at first
@SofieVL thanks for pointing this out, I have rebuild env from scratch by installing spaCy and prodigy, by validating that there is no errors and downloading specific models for what spaCy + prodigy can support right now. Looks like error disappeared and binary teach works great now!
Thanks again and happy holidays!
to get the version number.
Anyway, Prodigy up until 1.11.6 was pinning spaCy to <3.2
. That explains why rerunning the installation in your old environment throws the error: Prodigy tries to install spaCy 3.1.4. But because you had installed 3.2.1, you had also downloaded spaCy models for 3.2, so pip
couldn't easily downgrade spaCy. But as you've found, starting from scratch and letting Prodigy install spaCy from the start, should fix your problems for now.
Ofcourse we want you to be able to benefit from the newest spaCy releases! In fact, we had recently encountered the issue you described originally, and we've already fixed it to make sure Prodigy works well with the latest spaCy version. We're now working on a new release of Prodigy that will be compatible with spaCy 3.2. It will be available soonish 
while prodigy runs tightly connected with spaCy, would it be great to have spaCy supported versions or spaCy validate output here?
Version 1.11.6
Location /Users/fed/Library/Caches/pypoetry/virtualenvs/nel-riFBMyAx-py3.9/lib/python3.9/site-packages/prodigy
Prodigy Home /Users/fed/.prodigy
Platform macOS-11.6.1-x86_64-i386-64bit
Python Version 3.9.2
Database Name SQLite
Database Id sqlite
Total Datasets 6
Total Sessions 28
This is a nice idea! We'd just have to think about how to best implement this, since the spaCy version range is typically only defined in the package requirements on Prodigy (and we wouldn't want to duplicate this configuration so it doesn't go out-of-sync). It's definitely possible to retrieve this info via importlib.metadata
, though, so we can try that!