人生苦短我用Python pandas文件格式转换

前言
示例1 excel与csv互转
常用格式的方法
- Flat file
- Excel
- JSON
- XML
示例2 常用格式转换
- 简要需求
- 依赖
- export方法
- main方法
附其它格式的方法
- HTML
- Pickling
- Clipboard
- Latex
- HDFStore: PyTables (HDF5)
- Feather
- Parquet
- ORC
- SAS
- SPSS
- SQL
- Google BigQuery
- STATA

前言

pandas支持多种文件格式，通过pandas的IO方法，可以实现不同格式之间的互相转换。本文通过excel与csv互转的示例和pandas的支持的文件格式，实现一个简单的文件格式转换的功能。

示例1 excel与csv互转

在前文实现了excel转csv，即通过pandas将excel转csv，反过来也可以将csv转为excel。

下面是excel与csv互转的示例代码：

excel转csv

def export_csv(input_file, output_path):# 创建ExcelFile对象with pd.ExcelFile(input_file) as xls:# 获取工作表名称列表for i, sheet_name in enumerate(xls.sheet_names):# 读取工作表并转换为DataFramedf = pd.read_excel(xls, sheet_name=sheet_name)output_file = os.path.join(output_path, f'{i + 1}-{sheet_name}.csv')# 将DataFrame中的数据写入CSV文件。df.to_csv(output_file, index=False)

csv转为excel

def export_excel(input_file, output_file):if not output_file:input_path = pathlib.Path(input_file)output_path = input_path.parent / (input_path.stem + '.xlsx')output_file = str(output_path)df = pd.read_csv(input_file)df.to_excel(output_file, index=False)

常用格式的方法

以下来自pandas官网 Input/Outout部分

Flat file

方法	说明
`read_table`(filepath_or_buffer, *[, sep, …])	Read general delimited file into DataFrame.
`read_csv`(filepath_or_buffer, *[, sep, …])	Read a comma-separated values (csv) file into DataFrame.
`DataFrame.to_csv`([path_or_buf, sep, na_rep, …])	Write object to a comma-separated values (csv) file.
`read_fwf`(filepath_or_buffer, *[, colspecs, …])	Read a table of fixed-width formatted lines into DataFrame.

Excel

方法	说明
`read_excel`(io[, sheet_name, header, names, …])	Read an Excel file into a `pandas` `DataFrame`.
`DataFrame.to_excel`(excel_writer, *[, …])	Write object to an Excel sheet.
`ExcelFile`(path_or_buffer[, engine, …])	Class for parsing tabular Excel sheets into DataFrame objects.
`ExcelFile.book`
`ExcelFile.sheet_names`
`ExcelFile.parse`([sheet_name, header, names, …])	Parse specified sheet(s) into a DataFrame.

方法	说明
`Styler.to_excel`(excel_writer[, sheet_name, …])	Write Styler to an Excel sheet.

方法	说明
`ExcelWriter`(path[, engine, date_format, …])	Class for writing DataFrame objects into excel sheets.

JSON

方法	说明
`read_json`(path_or_buf, *[, orient, typ, …])	Convert a JSON string to pandas object.
`json_normalize`(data[, record_path, meta, …])	Normalize semi-structured JSON data into a flat table.
`DataFrame.to_json`([path_or_buf, orient, …])	Convert the object to a JSON string.

方法	说明
`build_table_schema`(data[, index, …])	Create a Table schema from `data`.

XML

方法	说明
`read_xml`(path_or_buffer, *[, xpath, …])	Read XML document into a `DataFrame` object.
`DataFrame.to_xml`([path_or_buffer, index, …])	Render a DataFrame to an XML document.

示例2 常用格式转换

根据常用格式的IO方法，完成一个常用格式的格式转换功能。

第一步从指定格式的文件中读取数据，并将其转换为 DataFrame 对象。

第二部将 DataFrame 中的数据写入指定格式的文件中。

简要需求

根据输入输出的文件后缀名，自动进行格式转换，若格式不支持输出提示。
支持的格式csv，xlsx，json，xml。

依赖

pip install pandas
pip install openpyxl
pip install lxml

export方法

def export(input_file, output_file):if not os.path.isfile(input_file):print('Input file does not exist')returnif input_file.endswith('.csv'):df = pd.read_csv(input_file, encoding='utf-8')elif input_file.endswith('.json'):df = pd.read_json(input_file, encoding='utf-8')elif input_file.endswith('.xlsx'):df = pd.read_excel(input_file)elif input_file.endswith('.xml', encoding='utf-8'):df = pd.read_xml(input_file)else:print('Input file type not supported')returnif output_file.endswith('.csv'):df.to_csv(output_file, index=False)elif output_file.endswith('.json'):df.to_json(output_file, orient='records', force_ascii=False)elif output_file.endswith('.xlsx'):df.to_excel(output_file, index=False)elif output_file.endswith('.xml'):df.to_xml(output_file, index=False)elif output_file.endswith('.html'):df.to_html(output_file, index=False, encoding='utf-8')else:print('Output file type not supported')return

main方法

def main(argv):input_path = Noneoutput_path = Nonetry:shortopts = "hi:o:"longopts = ["ipath=", "opath="]opts, args = getopt.getopt(argv, shortopts, longopts)except getopt.GetoptError:print('usage: export.py -i <inputpath> -o <outputpath>')sys.exit(2)for opt, arg in opts:if opt in ("-h", "--help"):print('usage: export.py -i <inputpath> -o <outputpath>')sys.exit()elif opt in ("-i", "--ipath"):input_path = argelif opt in ("-o", "--opath"):output_path = argprint(f'输入路径为：{input_path}')print(f'输出路径为：{output_path}')export(input_path, output_path)

附其它格式的方法

以下来自pandas官网 Input/Outout部分

HTML

方法	说明
`read_html`(io, *[, match, flavor, header, …])	Read HTML tables into a `list` of `DataFrame` objects.
`DataFrame.to_html`([buf, columns, col_space, …])	Render a DataFrame as an HTML table.

方法	说明
`Styler.to_html`([buf, table_uuid, …])	Write Styler to a file, buffer or string in HTML-CSS format.

Pickling

方法	说明
`read_pickle`(filepath_or_buffer[, …])	Load pickled pandas object (or any object) from file.
`DataFrame.to_pickle`(path, *[, compression, …])	Pickle (serialize) object to file.

Clipboard

方法	说明
`read_clipboard`([sep, dtype_backend])	Read text from clipboard and pass to `read_csv()`.
`DataFrame.to_clipboard`(*[, excel, sep])	Copy object to the system clipboard.

Latex

方法	说明
`DataFrame.to_latex`([buf, columns, header, …])	Render object to a LaTeX tabular, longtable, or nested table.

方法	说明
`Styler.to_latex`([buf, column_format, …])	Write Styler to a file, buffer or string in LaTeX format.

HDFStore: PyTables (HDF5)

方法	说明
`read_hdf`(path_or_buf[, key, mode, errors, …])	Read from the store, close it if we opened it.
`HDFStore.put`(key, value[, format, index, …])	Store object in HDFStore.
`HDFStore.append`(key, value[, format, axes, …])	Append to Table in file.
`HDFStore.get`(key)	Retrieve pandas object stored in file.
`HDFStore.select`(key[, where, start, stop, …])	Retrieve pandas object stored in file, optionally based on where criteria.
`HDFStore.info`()	Print detailed information on the store.
`HDFStore.keys`([include])	Return a list of keys corresponding to objects stored in HDFStore.
`HDFStore.groups`()	Return a list of all the top-level nodes.
`HDFStore.walk`([where])	Walk the pytables group hierarchy for pandas objects.

Warning

One can store a subclass of DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.

Feather

方法	说明
`read_feather`(path[, columns, use_threads, …])	Load a feather-format object from the file path.
`DataFrame.to_feather`(path, **kwargs)	Write a DataFrame to the binary Feather format.

Parquet

方法	说明
`read_parquet`(path[, engine, columns, …])	Load a parquet object from the file path, returning a DataFrame.
`DataFrame.to_parquet`([path, engine, …])	Write a DataFrame to the binary parquet format.

ORC

方法	说明
`read_orc`(path[, columns, dtype_backend, …])	Load an ORC object from the file path, returning a DataFrame.
`DataFrame.to_orc`([path, engine, index, …])	Write a DataFrame to the ORC format.

SAS

方法	说明
`read_sas`(filepath_or_buffer, *[, format, …])	Read SAS files stored as either XPORT or SAS7BDAT format files.

SPSS

方法	说明
`read_spss`(path[, usecols, …])	Load an SPSS file from the file path, returning a DataFrame.

SQL

方法	说明
`read_sql_table`(table_name, con[, schema, …])	Read SQL database table into a DataFrame.
`read_sql_query`(sql, con[, index_col, …])	Read SQL query into a DataFrame.
`read_sql`(sql, con[, index_col, …])	Read SQL query or database table into a DataFrame.
`DataFrame.to_sql`(name, con, *[, schema, …])	Write records stored in a DataFrame to a SQL database.

Google BigQuery

方法	说明
`read_gbq`(query[, project_id, index_col, …])	(DEPRECATED) Load data from Google BigQuery.

STATA

方法	说明
`read_stata`(filepath_or_buffer, *[, …])	Read Stata file into DataFrame.
`DataFrame.to_stata`(path, *[, convert_dates, …])	Export DataFrame object to Stata dta format.

方法	说明
`StataReader.data_label`	Return data label of Stata file.
`StataReader.value_labels`()	Return a nested dict associating each variable name to its value and label.
`StataReader.variable_labels`()	Return a dict associating each variable name with corresponding label.
`StataWriter.write_file`()	Export DataFrame object to Stata dta format.

人生苦短我用Python pandas文件格式转换

人生苦短我用Python pandas文件格式转换

前言

示例1 excel与csv互转

常用格式的方法

Flat file

Excel

JSON

XML

示例2 常用格式转换

简要需求

依赖

export方法

main方法

附其它格式的方法

HTML

Pickling

Clipboard

Latex

HDFStore: PyTables (HDF5)

Feather

Parquet

ORC

SAS

SPSS

SQL

Google BigQuery

STATA

最新新闻

热搜词