添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account Couldn't build proto file into descriptor pool: duplicate file name when using upb python implementation #13745 Couldn't build proto file into descriptor pool: duplicate file name when using upb python implementation #13745 Atheuz opened this issue Jan 9, 2023 · 11 comments

What version of protobuf and what language are you using?
Version: 4.21.12
Language: Python

What operating system (Linux, Windows, ...) and version?

What runtime / compiler are you using (e.g., python version or gcc version)
python: 3.10
buf: 1.11.0
libprotoc: 3.21.12

What did you do?
Steps to reproduce the behavior:

  • Go to https://github.com/Atheuz/test-protobuf-schema-error
  • Clone the repo
  • Cd into the repo directory
  • Create a virtualenv: virtualenv .venv --python=3.10
  • Activate the virtualenv: source .venv/bin/activate
  • Install dependencies: pip install -r requirements.txt
  • Run pytest: pytest
  • See that when PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=upb , pytest fails to run.
  • Change PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=upb to PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python in pytest.ini .
  • Run pytest: pytest
  • See that when PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python , pytest runs without issue.
  • Similarly, if you downgrade to protobuf==3.20.3 and run the test using PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp or PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python , it also succeeds: Only PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=upb which became available in 4.x fails.
  • What did you expect to see

    I expected the behaviour for upb to be the same as the python implementation. I.e. it works without issue.

    What did you see instead?

    The basic error that I get is: TypeError: Couldn't build proto file into descriptor pool: duplicate file name (google/protobuf/descriptor.proto)

    More detail can be seen in the error.txt file here: https://github.com/Atheuz/test-protobuf-schema-error/blob/master/error.txt

    Anything else we should know about your project / environment

    What's happening is that we are going into common/terms_pb2.py and manually replacing from google.protobuf import descriptor_pb2 as google_dot_protobuf_dot_descriptor__pb2 with from google_test.protobuf import descriptor_pb2 as google_dot_protobuf_dot_descriptor__pb2 .

    Note that we are not doing this replacement in common/options_pb2.py , having this conflict of 2 different imports is necessary for the error to appear: i.e. if we do the replacement in common/options_pb2.py , then the error disappears and everything works fine, but this is not what we're doing on our end, in that we only do the replacement in one file and not all files.

    The descriptor_pb2.py file was generated using buf version 1.11.0 and protoc version 3.21.12 using the following commands:

    cd build
    buf generate --config=buf.yaml --template=buf.gen.yaml
    cp google/protobuf/descriptor_pb2.py ../google_test/protobuf/descriptor_pb2.py
    

    The reason our organization does this replacement in the common/terms_pb2.py file is related to the Confluent Kafka Schema Registry, where apparently if we don't do this replacement the schema doesn't match what we have in our Confluent Kafka Schema Registry.

    Note that it works fine in protobuf 3.20.3 using the cpp implementation, and the python implementation. It also works fine in protobuf 4.21.12 using the python implementation. The only implementation that has an issue with this edit is the upb implementation.

    One quirk of 4.21.x is that python proto code generated with older versions of protoc aren't compatible with it. I followed your repro steps but instead regenerated the *_pb2.py files, and it passes. Doing a diff I found:

    -#from google.protobuf import descriptor_pb2 as google_dot_protobuf_dot_descriptor__pb2  # works
    -from google_test.protobuf import descriptor_pb2 as google_dot_protobuf_dot_descriptor__pb2  # broken
    +from google.protobuf import descriptor_pb2 as google_dot_protobuf_dot_descriptor__pb2
    

    It looks like you're trying to insert your own descriptor into generated code? I'm kind of surprised this ever worked.. What are you trying to do?

    @mkruskal-google yes, we're inserting our own descriptor into the generated code. My understanding is that this is something that's necessary for our confluent kafka schema registry lookups to work correctly.

    But the base issue is that this works fine in the python implementation and the cpp implementation on 3.20.3, and it works fine on the python implementation on 4.21.12 as well. It's only the upb implementation that it fails on.

    Yes, I'm aware.

    The problem is that different implementations other than upb do not have an issue with this. I'm also trying to see if we can get rid of this internally, but for years this has worked fine for both cpp and python implementations and only with the 4.21 release has it broken with the switch over to upb as the default.

    Here's the internal user story:

    "I want to ensure our users can pick up their code bindings and be able to use them without any modifications
    So that our developer experience is as seamless as possible, and communication with Kafka and the schema registry works correctly.

    Notes:
    We currently need to do some adjustment to our python code bindings before these are pushed to local PyPi. This will ensure producers/consumers will use matching schemas with the ones produced in the schema registry.

    The main issue is that the protobuf library for python includes the google common code bindings as well, taking precedence over our own descriptor.proto. Unfortunately, the internal descriptor field seem to be different, and schema registry doesn't like that.

    By doing some tweaks to our produced code bindings, however, we can make this work. Steps to modify the code bindings:

    AFTER generating the python code bindings, we should:

  • rename the google/protobuf folder into google_test/protobuf
  • any generated code binding importing from our google.protobuf needs to have their import changed, by pointing to google_test/protobuf - only the import of a file we own needs to be changed"
  • Here are the descriptors in a zip file, ours is called our_descriptor_pb2.py, google one is called your_descriptor_pb2.py.
    descriptors.zip
    Note that the Google descriptor is 107 KB and our descriptor is 119 KB.

    Our descriptor was generated using buf 1.1.0 and protoc 3.21.12 as seen here: https://github.com/Atheuz/test-protobuf-schema-error/tree/master/build.

    Note that if you generate it with protoc 3.21.12 in the build/external/google/protobuf directory using the command protoc -I=. --python_out=. descriptor.proto you end up with a small descriptor_pb2.py file of like 16 KB, compared to when you generate it using buf in the build directory using the command buf generate --config=buf.yaml --template=buf.gen.yaml where you end up with the 119 KB descriptor_pb2.py file.

    I'm not aware of any feature that supports swapping out descriptor.proto (other than a post-generation find/replace). What are you trying to change in it?

    Hi, I also have the same issue here. I have a project that needs to be updated with proto schema, so I have a job for updating and replacing the schema. when the proto changes, I get the following error but if it does not change, its works fine!

    TypeError: Couldn't build proto file into descriptor pool: duplicate symbol
              

    We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

    This issue is labeled inactive because the last activity was over 90 days ago.

    We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.

    This issue was closed and archived because there has been no new activity in the 14 days since the inactive label was added.