您的位置:首页 > 汽车 > 时评 > 房产备案价格查询官网_专业建设特色_最常用的网页制作软件_无锡seo

房产备案价格查询官网_专业建设特色_最常用的网页制作软件_无锡seo

2025/1/6 7:43:11 来源:https://blog.csdn.net/shizheng_Li/article/details/144804007  浏览:    关键词:房产备案价格查询官网_专业建设特色_最常用的网页制作软件_无锡seo
房产备案价格查询官网_专业建设特色_最常用的网页制作软件_无锡seo

加载本地路径的arrow文件并打印

如下图所示,我自定义的数据集保存在这个路径:~/.cache/huggingface/datasets/allenai___tulu-3-sft-mixture2/default/0.0.0/55e9fd6d41c3cd1a98270dff07557bc2a1e1ba91/tulu-3-sft-mixture-train-00000-of-00001.arrow。现在想看一下能否读取,并打印出来第一条

在这里插入图片描述

from datasets import load_dataset# 设置路径,指定包含所有 Arrow 文件的文件夹
dataset_path = "~/.cache/huggingface/datasets/allenai___tulu-3-sft-mixture2/default/0.0.0/55e9fd6d41c3cd1a98270dff07557bc2a1e1ba91"# 加载 Arrow 数据集,确保使用 Arrow 格式
dataset = load_dataset(dataset_path, data_files="tulu-3-sft-mixture-train-00000-of-00001.arrow", split="train")# 打印加载结果
print(dataset[0])
# Generating train split: 1000 examples [00:00, 32709.99 examples/s]
# Dataset({
#     features: ['id', 'messages', 'source'],
#     num_rows: 1000
# })

打印的结果:

{"id": "ai2-adapt-dev/flan_v2_converted_21688","messages": [{"content": "Given the task definition, example input & output, solve the new input case.\nGiven a text passage as input comprising of dialogue of negotiations between a seller and a buyer about the sale of an item, your task is to classify the item being sold into exactly one of these categories: 'housing', 'furniture', 'bike', 'phone', 'car', 'electronics'. The output should be the name of the category from the stated options and there should be exactly one category for the given text passage.\nExample: Seller: hi\nBuyer: Hello\nSeller: do you care to make an offer?\nBuyer: The place sounds nice, but may be a little more than I can afford\nSeller: well how much can you soend?\nBuyer: I was looking for something in the 1500-1600 range\nSeller: That is really unreasonable considering all the immenities and other going rates, you would need to come up to at least 3000\nBuyer: I have seen some 2 bedrooms for that price, which I could split the cost with a roommate, so even with amenities, this may be out of my range\nSeller: it may be then... the absolute lowest i will go is 2700. that is my final offer.\nBuyer: Ok, I think the most I could spend on this is 2000 - we are a ways apart\nSeller: ya that is far too low like i said 2700\nBuyer: Ok, thanks for your consideration. I will have to keep looking for now.\nSeller: good luck\nOutput: housing\nThe answer 'housing' is correct because a house is being talked about which is indicated by the mention of 'bedrooms' and 'amenities' which are words that are both related to housing.\n\nNew input case for you: Buyer: hi\nSeller: hello. Are you interested in the table for sale?\nBuyer: Yes, can you tell me , is your home smoke free?\nSeller: yes it is. I do , however have pets.\nBuyer: they never peed on it or anything, right?\nSeller: No they have not. There is a small broken corner but you cannot tell when it is against the wall\nBuyer: Do you think you would be willing to bring down the price since it is somewhat damaged?\nSeller: Maybe a little, however It is fairly new so I would like to get close to what i paid for\nBuyer: well a used product can rarely fetch the price paid, I was thinking  $59 and I can come pick it up myself.\nSeller: I would consider $75\nBuyer: I feel that is too much since it is damaged already and have to try to repaint and repair it.\nSeller: The paint is very fresh and does not need a new coat. The price I paid last year was $300 so you are getting it for less than 1/3\nBuyer: I need to paint it if there is a chip, either I can do 60 cash and come get it, or I will have to pass.\nSeller: Okay we can do that.\nBuyer: \nSeller: \n\nOutput: ","role": "user"},{"content": "furniture","role": "assistant"}],"source": "ai2-adapt-dev/flan_v2_converted"}

如果是加载很多arrow文件呢?

比如路径如下:~/.cache/huggingface/datasets/allenai___tulu-3-sft-mixture/default/0.0.0/55e9fd6d41c3cd1a98270dff07557bc2a1e1ba91,这里的问题是什么呢?因为使用的过程中,会产生一些cache*.arrow中间文件(用于加速处理),所以需要指定哪些具体我们需要的arrow文件,处理方法见下面的代码。

在这里插入图片描述

代码如下:

# 设置数据集路径和文件
dataset_path = "~/.cache/huggingface/datasets/allenai___tulu-3-sft-mixture/default/0.0.0/55e9fd6d41c3cd1a98270dff07557bc2a1e1ba91"
selected_files = ["tulu-3-sft-mixture-train-00000-of-00006.arrow","tulu-3-sft-mixture-train-00001-of-00006.arrow","tulu-3-sft-mixture-train-00002-of-00006.arrow","tulu-3-sft-mixture-train-00003-of-00006.arrow","tulu-3-sft-mixture-train-00004-of-00006.arrow","tulu-3-sft-mixture-train-00005-of-00006.arrow"
]# 加载数据集
dataset = load_dataset(dataset_path, data_files=selected_files)

这样就可以加载一个文件夹下面的很多arrow文件,只要在列表中指定好即可。

后记

2024年12月29日13点24分于上海。

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com