您的位置:首页 > 房产 > 建筑 > 二维码生成器制作_全网营销概念_跨界营销案例_搜索引擎优化特点

二维码生成器制作_全网营销概念_跨界营销案例_搜索引擎优化特点

2024/12/27 11:04:32 来源:https://blog.csdn.net/u012599545/article/details/144610266  浏览:    关键词:二维码生成器制作_全网营销概念_跨界营销案例_搜索引擎优化特点
二维码生成器制作_全网营销概念_跨界营销案例_搜索引擎优化特点

官方代码见:GitHub - apoorvumang/prompt-lookup-decoding

UPDATE 2: This method is now available in vLLM as well by setting speculative_model="[ngram]" 🥳

UPDATE: This has been added to the transformers library. Please see this for a code example, or simply add prompt_lookup_num_tokens=10 to your model.generate(...) call.

TLDR: We modify speculative decoding where we replace the draft model with simple string matching in the prompt to generate candidate token sequences. This results in significant speedups (2x-4x) in input-grounded tasks, with no effect on output quality. This method can be used with any decoder model without model changes or external datastore, and with both greedy and sampling techniques.

Intuition: In several LLM use cases where you're doing input grounded generation (summarization, document QA, multi-turn chat, code editing), there is high n-gram overlap between LLM input (prompt) and LLM output. This could be entity names, phrases, or code chunks that the LLM directly copies from the input while generating the output. Prompt lookup exploits this pattern to speed up autoregressive decoding in LLMs.

def find_candidate_pred_tokens(input_ids, max_ngram_size=3, num_pred_tokens=10):input_length = input_ids.size(1)for ngram_size in range(max_ngram_size, 0, -1):# Extract the last n tokens as our search ngramngram = input_ids[0, -ngram_size:].tolist()# Create sliding windows of size ngram_sizewindows = input_ids.unfold(dimension=1, size=ngram_size, step=1)# Convert ngram to a tensor for comparisonngram_tensor = torch.tensor(ngram, device=input_ids.device).unsqueeze(0)# Find where the windows match the ngrammatches = (windows == ngram_tensor).all(dim=2)# Get the indices of matchesmatch_indices = matches.nonzero(as_tuple=True)[1]# Iterate through match indices to find a valid continuationfor idx in match_indices:start_idx = idx + ngram_sizeend_idx = start_idx + num_pred_tokens# Ensure we don't go beyond the length of input_ids and avoid self-matchif end_idx <= input_length and start_idx < input_length - ngram_size:return input_ids[0, start_idx:end_idx]# If no match is found, return an empty tensorreturn torch.tensor([], dtype=torch.long, device=input_ids.device)

ODOs/Thoughts/Future work

  • There's probably better ways to do string matching than the current one, and there are several obvious things to improve eg. what to do when there are multiple matches? Whats the ideal length of continuation?
  • We haven't yet tried sampling, although there's no reason it shouldn't work.
    • Here, one additional thing to test would be whether prompt lookup while sampling can affect hallucination rates, since this artifically increases probability of sampling exact sequences from input (this was suggest by my colleague Shwetha S)
  • Testing actual FLOPs impact and tradeoffs is needed
  • Also need to figure out best hyperparams - 3 and 10 were chosen on very little testing
  • It would be an interesting challenge to design the "best lookup function" for decoding, could even be a competition?

这个方法可能还是有问题的,正如坐着所说,可能存在幻觉,不一定ngram匹配上的就能加速

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com