您的位置:首页 > 新闻 > 热点要闻 > 【PostgreSQL17新特性之-COPY FROM的ON_ERROR容错选项】

【PostgreSQL17新特性之-COPY FROM的ON_ERROR容错选项】

2024/10/6 7:44:02 来源:https://blog.csdn.net/weixin_47308871/article/details/139325589  浏览:    关键词:【PostgreSQL17新特性之-COPY FROM的ON_ERROR容错选项】

一、ON_ERROR容错选项

1.ON_ERROR容错选项使用方式

PostgreSQL17-beta1针对COPY FROM增加的ON_ERROR选项使得Copy From语句在执行过程中部分解析、数据格式、字符集等相关的报错可以根据需求选择报错回滚事务里的全部记录/丢弃输入行并继续处理下一行。

image.png

error_action可以选择的值为stop和ignore,具体的作用如下:

选项作用
stop使命令失败,ON_ERROR该选项的的默认值。和PostgreSQL-16及以前的默认行为一致。
ignore丢弃输入行并继续处理下一行。

针对ignore的这个作用,我测试过程发现有时候可以丢弃输入行并继续处理下一行。有时候并没有丢弃输入行并继续处理下一行,而是在错误行中断了。把报错行之前的数据写入到表里,然后提交,报错行之后的数据全部忽略,即使报错行后有正确数据的行,也不能copy进表里。出现了两种现象,具体可看后边测试及总结部分

大致的使用方式为如下所示:

copy test_copy_onerror  from '/home/postgres/1.sql' (ON_ERROR ignore);
copy test_copy_onerror  from '/home/postgres/1.sql' (ON_ERROR stop);

COPY的默认行为是"ON_ERROR stop",通过源码可以看到ON_ERROR选项的error_action目前也是仅有两种,源码注释里写着日后可能会添加更多的选项。

src/include/commands/copy.h/** Represents where to save input processing errors.  More values to be added* in the future.*/
typedef enum CopyOnErrorChoice
{COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */COPY_ON_ERROR_IGNORE,		/* ignore errors */
} CopyOnErrorChoice;----------------------------------------------------------------------------------------
src/backend/commands/copy.c/** Extract a CopyOnErrorChoice value from a DefElem.*/
static CopyOnErrorChoice
defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
{char	   *sval = defGetString(def);if (!is_from)ereport(ERROR,(errcode(ERRCODE_INVALID_PARAMETER_VALUE),errmsg("COPY ON_ERROR cannot be used with COPY TO"),parser_errposition(pstate, def->location)));/** Allow "stop", or "ignore" values.*/if (pg_strcasecmp(sval, "stop") == 0)return COPY_ON_ERROR_STOP;if (pg_strcasecmp(sval, "ignore") == 0)return COPY_ON_ERROR_IGNORE;ereport(ERROR,(errcode(ERRCODE_INVALID_PARAMETER_VALUE),errmsg("COPY ON_ERROR \"%s\" not recognized", sval),parser_errposition(pstate, def->location)));return COPY_ON_ERROR_STOP;	/* keep compiler quiet */
}

2.测试案例

(1)创建测试表test_copy_onerror

创建一个测试表test_copy_onerror,name的类型为varchar(5),插入超过5个字符的则会报错。

postgres<17beta1>(ConnAs[postgres]:PID[22701] 2024-05-28/19:17:48)=# create table test_copy_onerror(id int,name varchar(5));
CREATE TABLE
postgres<17beta1>(ConnAs[postgres]:PID[22701] 2024-05-28/19:17:55)=# \d test_copy_onerrorTable "public.test_copy_onerror"
+--------+----------------------+-----------+----------+---------+
| Column |         Type         | Collation | Nullable | Default |
+--------+----------------------+-----------+----------+---------+
| id     | integer              |           |          |         |
| name   | character varying(5) |           |          |         |
+--------+----------------------+-----------+----------+---------+postgres<17beta1>(ConnAs[postgres]:PID[22701] 2024-05-28/19:17:58)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)

(2)准备一些要copy到表里的数据

准备一些要copy到表里的数据,其中第三行和第五行的第二列数据都是有问题的,长度超了,不能copy到表里

[postgres@xmaster-PostgreSQL-17beta1-06 ~]$ cat testdata.txt
1       aaaaa
2       bbbbb
3       cccccc
4       ddddd
5       ffffff

测试结果1:使用copy的默认行为

结果如下,可以看出事务回滚了,没有任何一条数据插入,并且报出了在第三行数据存在问题。

postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:30:24)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:30:27)=# copy test_copy_onerror  from '/home/postgres/testdata.txt';
ERROR:  value too long for type character varying(5)
CONTEXT:  COPY test_copy_onerror, line 3, column name: "cccccc"
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:30:30)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)

测试结果2:使用ON_ERROR选项配合stop值

结果如下,可以看出事务回滚了,没有任何一条数据插入,并且报出了在第三行数据存在问题。可以看出现象和COPY的默认行为一致。

postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:32:05)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:32:06)=# copy test_copy_onerror  from '/home/postgres/testdata.txt' (ON_ERROR stop);
ERROR:  value too long for type character varying(5)
CONTEXT:  COPY test_copy_onerror, line 3, column name: "cccccc"
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:32:22)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)

测试结果3:使用ON_ERROR选项配合ignore值(错误的值是超出字段长度)

结果如下,它把报错行之前的数据写入到表里,然后提交,报错行之后的数据全部忽略,即使报错行后有正确数据的行,也不能copy进表里。。

postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:34:15)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:34:17)=# copy test_copy_onerror  from '/home/postgres/testdata.txt' (ON_ERROR ignore);
NOTICE:  3 rows were skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:35:34)=# table test_copy_onerror;
+----+-------+
| id | name  |
+----+-------+
|  1 | aaaaa |
|  2 | bbbbb |
+----+-------+
(2 rows)

测试结果4:使用ON_ERROR选项配合ignore值(错误的值是类型不匹配)

结果如下,错误的值是类型不匹配时,会丢弃输入行并继续处理下一行。只有报错的行没有导入到表里。

postgres<17beta1>(ConnAs[postgres]:PID[23019] 2024-05-28/20:10:14)=# create table t2 (id int);
CREATE TABLE
postgres<17beta1>(ConnAs[postgres]:PID[23050] 2024-05-28/20:15:50)=# \d t2Table "public.t2"
+--------+---------+-----------+----------+---------+
| Column |  Type   | Collation | Nullable | Default |
+--------+---------+-----------+----------+---------+
| id     | integer |           |          |         |
+--------+---------+-----------+----------+---------+//要导入的数据有三行,其中第二行数据和表的列数据不符
[postgres@xmaster-PostgreSQL-17beta1-06 ~]$ cat 1.txt
1
n
2[postgres@xmaster-PostgreSQL-17beta1-06 ~]$ psql
Border style is 2.
Line style is ascii.
psql (17beta1)
Type "help" for help.postgres<17beta1>(ConnAs[postgres]:PID[23083] 2024-05-28/20:16:56)=# table t2;
+----+
| id |
+----+
+----+
(0 rows)postgres<17beta1>(ConnAs[postgres]:PID[23083] 2024-05-28/20:17:02)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore);
NOTICE:  1 row was skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23083] 2024-05-28/20:17:10)=# table t2;
+----+
| id |
+----+
|  1 |
|  2 |
+----+
(2 rows)

4.总结

经过测试,对于copy from的不同错误类型,就算选择了同样的ON_ERROR ignore选项,行为也可能不一样。目前我的测试中展现出两种行为:

(1)选择ON_ERROR ignore选项,错误的值是超出字段长度

没有丢弃输入行并继续处理下一行,而是在错误行中断了。把报错行之前的数据写入到表里,然后提交,报错行之后的数据全部忽略,即使报错行后有正确数据的行,也不能copy进表里。

(2)选择ON_ERROR ignore选项,错误的值是类型不匹配

错误的值是类型不匹配时,会丢弃输入行并继续处理下一行。只有报错的行没有导入到表里。

二、pg_stat_progress_copy视图的改进

PostgreSQL 14 开始,添加了pg_stat_progress_copy视图来检查 COPY 处理的进度。

PostgreSQL17-beta1版本里,添加了一个tuples_skipped的列,由于包含格式错误的数据而被跳过的元组数。仅当为 ON_ERROR 选项指定了停止以外的值时,此计数器才会增加。

postgres<16.1>(ConnAs[postgres]:PID[53951] 2024-05-29/17:19:31)=# \d pg_stat_progress_copyView "pg_catalog.pg_stat_progress_copy"
+------------------+---------+-----------+----------+---------+
|      Column      |  Type   | Collation | Nullable | Default |
+------------------+---------+-----------+----------+---------+
| pid              | integer |           |          |         |
| datid            | oid     |           |          |         |
| datname          | name    |           |          |         |
| relid            | oid     |           |          |         |
| command          | text    |           |          |         |
| type             | text    |           |          |         |
| bytes_processed  | bigint  |           |          |         |
| bytes_total      | bigint  |           |          |         |
| tuples_processed | bigint  |           |          |         |
| tuples_excluded  | bigint  |           |          |         |
+------------------+---------+-----------+----------+---------+postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:38:19)=# \d pg_stat_progress_copyView "pg_catalog.pg_stat_progress_copy"
+------------------+---------+-----------+----------+---------+
|      Column      |  Type   | Collation | Nullable | Default |
+------------------+---------+-----------+----------+---------+
| pid              | integer |           |          |         |
| datid            | oid     |           |          |         |
| datname          | name    |           |          |         |
| relid            | oid     |           |          |         |
| command          | text    |           |          |         |
| type             | text    |           |          |         |
| bytes_processed  | bigint  |           |          |         |
| bytes_total      | bigint  |           |          |         |
| tuples_processed | bigint  |           |          |         |
| tuples_excluded  | bigint  |           |          |         |
| tuples_skipped   | bigint  |           |          |         |  <--增加的列
+------------------+---------+-----------+----------+---------+

三、COPY添加的LOG_VERBOSITY选项

COPY添加的LOG_VERBOSITY选项,这个选项可以报告 COPY FROM 被忽略的错误行。

image.png

verbosity可设置的值作用
default默认配置,报告跳过了多少行。
verbose冗余输出设置。对于有错误的每一行,报告哪一列有问题以及最后跳过了多少行。

目前verbosity可设置的值仅有两个

src/include/commands/copy.h... ...
/** Represents verbosity of logged messages by COPY command.*/
typedef enum CopyLogVerbosityChoice
{COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
... ...----------------------------------------------------------------------------------------
src/backend/commands/copy.c... ...
/** Extract a CopyLogVerbosityChoice value from a DefElem.*/
static CopyLogVerbosityChoice
defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
{char	   *sval;/** Allow "default", or "verbose" values.*/sval = defGetString(def);if (pg_strcasecmp(sval, "default") == 0)return COPY_LOG_VERBOSITY_DEFAULT;if (pg_strcasecmp(sval, "verbose") == 0)return COPY_LOG_VERBOSITY_VERBOSE;ereport(ERROR,(errcode(ERRCODE_INVALID_PARAMETER_VALUE),errmsg("COPY LOG_VERBOSITY \"%s\" not recognized", sval),parser_errposition(pstate, def->location)));return COPY_LOG_VERBOSITY_DEFAULT;	/* keep compiler quiet */
}
... ...

使用方式和是否使用 LOG_VERBOSITY verbose的对比如下,带有LOG_VERBOSITY verbose的会把跳过的行的行号以及具体数据均打印出来。

postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:05)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore);
NOTICE:  1 row was skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:06)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore,LOG_VERBOSITY default);
NOTICE:  1 row was skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:08)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore,LOG_VERBOSITY verbose);
NOTICE:  skipping row due to data type incompatibility at line 2 for column id: "n"
NOTICE:  1 row was skipped due to data type incompatibility
COPY 2
------------------
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:19)=#
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:20)=# copy test_copy_onerror  from '/home/postgres/testdata.txt' (ON_ERROR ignore);
NOTICE:  3 rows were skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:36)=# copy test_copy_onerror  from '/home/postgres/testdata.txt' (ON_ERROR ignore,LOG_VERBOSITY default);
NOTICE:  3 rows were skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:45)=# copy test_copy_onerror  from '/home/postgres/testdata.txt' (ON_ERROR ignore,LOG_VERBOSITY verbose);
NOTICE:  skipping row due to data type incompatibility at line 3 for column name: "cccccc"
NOTICE:  skipping row due to data type incompatibility at line 4 for column id: "4       dddd"
NOTICE:  skipping row due to data type incompatibility at line 5 for column id: "5       ffffff"
NOTICE:  3 rows were skipped due to data type incompatibility
COPY 2

image.png

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com