Interrupting completion stream in Python - API - OpenAI Developer Forum

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

仗义的冲锋衣 · Regular Expression ...· 51 分钟前 ·

曾经爱过的帽子 · AttributeError: ...· 1小时前 ·

爱喝酒的核桃 · KeyError in Python ...· 5 小时前 ·

风流倜傥的大熊猫 · python 程序报错 ...· 5 小时前 ·

不羁的围巾 · 在BASH中读取和循环JSON文件_在VBA ...· 8 小时前 ·

会搭讪的键盘 · 揭秘一加7 Pro屏幕特性 ...· 1 周前 ·

鼻子大的煎饼果子 · Sean O'Pry & Helena ...· 2 周前 ·

深沉的牙膏 · Issue with 04 ...· 5 月前 ·

火爆的凳子 · [综主FZ]不更就穿___书吧___仙女读书网· 9 月前 ·

谦逊的书包 · Private networking ...· 1 年前 ·

Is it possible to interrupt completion stream and not waste tokens? E.g. when I see that it’s looping or going in the wrong direction.

I know I can use stream option and then use the response object like a generator.

response = openai.Completion.create(
    stream=True,
for line in response:
    print(line)
But is it enough to just jump out of the loop when I decide enough is enough? Will the server then stop generating the rest of the tokens?
              I’m looking for something like the option to interrupt in the playground. You can just cancel the stream if you see that it’s going in the wrong direction, though I’m not sure if this is really telling the server to stop computing.
I’m not looking for Stop Sequences. I just want the user of my app to have the ability to quickly stop and try something else (and possibly not waste all tokens).
              According to my research, this should do the trick, since openai.Completion.create uses requests under the hood:
response = openai.Completion.create(
    # Other stuff...
    stream=True,
    for stream_resp in response:
        # Do stuff...
        if thing_happens:
          break
except Exception as e:
    print(e)
finally:
    response.close()
              I came up with the same solution, which also works on my end.  Though, while I’m sure the server-side of this equation will necessarily generate at least a couple more tokens than is received by the client, what I was hoping for was some assertion from OpenAI (or, from someone who’s done some meticulous testing with this method to determine whether they are charged for the total sum of tokens that WOULD have been received) that once (more or less) the connection is no longer open from the client side, that the server necessarily stops generating tokens.
              I make a simple test for @thehunmonkgroup 's solution.
I make a call to gpt-3.5-turbo model with input:
Please introduce GPT model structure as detail as possible
And let the api print all the token’s. The statistic result from OpenAI usage page is (I am a new user and is not allowed to post with media, so I only copy the result):

17 prompt + 441 completion = 568 tokens
After that, I stop the generation when the number of token received is 9, the result is:

17 prompt + 27 completion = 44 tokens
It seems there are roughly extra 10 tokens generated after I stop the generation.
Then I stop the generation when the number is 100, the result is:

17 prompt + 111 completion = 128 tokens
So I think the solution work well but with extra 10~20 tokens every time.
def ask_question(request):
thread = ChatThread(request)
thread_references[thread.getName()] = thread  # Store a reference to the thread
thread.start()
# Save the thread reference to the database the thread name
"thread.getName()"
i use django model
thread_reference = ThreadReference(thread_name=thread.getName())
thread_reference.save()
The thread class
class ChatThread(threading.Thread):
def __init__(self,request):
    self.request = request
    self.response = None  # Store the response object to stop response.close()
    self.stop_event = threading.Event()  # Event object to signal thread to stop
    self.thread_name = str(uuid.uuid4())  # Generate a unique thread name using UUID
    threading.Thread.__init__(self, name=self.thread_name)
def run(self):
        channel_layer = get_channel_layer()
        i = 0
        # Simple Streaming ChatCompletion Request
        generated_content = []
        self.response = openai.ChatCompletion.create(
            model='gpt-3.5-turbo',
            messages=[
                {'role': 'user', 'content': self.request.data.get('question','')}
            temperature=0,
            stream=True
        for chunk in self.response:
            # time.sleep(3) # you can use to slow down if needed for testing
            content = chunk["choices"][0]["delta"].get("content", "")
            finish_reason = chunk["choices"][0].get("finish_reason", "")
            if(finish_reason!="stop"):
                data = {"current_total": i, "content": content}
                self.stop_event.set()  # Set the event to stop the thread
            else:
                data = {"current_total": i, "content": "@@"+finish_reason+"@@"}
            generated_content.append(content)
            async_to_sync(channel_layer.group_send)(
                        f"chat_{self.request.data.get('chat_room','')}",{
                            # type is the function called from consumers
                            'type':'send_notification',
                            # this is the value send to send_notification function in consumer
                            'value': json.dumps(data)
            i += 1
            combined_content = ''.join(generated_content) 
    except Exception as e:
        print(e)
def stop(self):
    self.stop_event.set()
    if self.response:
        self.response.close()  # Close the response if it exists
to close the response and thread i use apis in django rest rest_framework

@api_view([‘POST’])

def stop_thread(request):

thread_name = request.data.get(‘thread_name’) # thread_name is the uuid that is saved in your database
# Get the thread reference from the database
thread_reference = get_object_or_404(ThreadReference, thread_name=thread_name)
if thread_name in thread_references:
    thread_references[thread_name].stop()
    del thread_references[thread_name]
    # thread_reference.delete() # delete the thread name from database if you use django 
return Response({'message': f'Thread {thread_name} has been stopped.'})