一、ON_ERROR容错选项
1.ON_ERROR容错选项使用方式
PostgreSQL17-beta1针对COPY FROM增加的ON_ERROR选项使得Copy From语句在执行过程中部分解析、数据格式、字符集等相关的报错可以根据需求选择报错回滚事务里的全部记录/丢弃输入行并继续处理下一行。
error_action可以选择的值为stop和ignore,具体的作用如下:
选项 | 作用 |
---|---|
stop | 使命令失败,ON_ERROR该选项的的默认值。和PostgreSQL-16及以前的默认行为一致。 |
ignore | 丢弃输入行并继续处理下一行。 |
针对ignore的这个作用,我测试过程发现有时候可以丢弃输入行并继续处理下一行。有时候并没有丢弃输入行并继续处理下一行,而是在错误行中断了。把报错行之前的数据写入到表里,然后提交,报错行之后的数据全部忽略,即使报错行后有正确数据的行,也不能copy进表里。出现了两种现象,具体可看后边测试及总结部分
大致的使用方式为如下所示:
copy test_copy_onerror from '/home/postgres/1.sql' (ON_ERROR ignore);
copy test_copy_onerror from '/home/postgres/1.sql' (ON_ERROR stop);
COPY的默认行为是"ON_ERROR stop",通过源码可以看到ON_ERROR选项的error_action目前也是仅有两种,源码注释里写着日后可能会添加更多的选项。
src/include/commands/copy.h/** Represents where to save input processing errors. More values to be added* in the future.*/
typedef enum CopyOnErrorChoice
{COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */COPY_ON_ERROR_IGNORE, /* ignore errors */
} CopyOnErrorChoice;----------------------------------------------------------------------------------------
src/backend/commands/copy.c/** Extract a CopyOnErrorChoice value from a DefElem.*/
static CopyOnErrorChoice
defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
{char *sval = defGetString(def);if (!is_from)ereport(ERROR,(errcode(ERRCODE_INVALID_PARAMETER_VALUE),errmsg("COPY ON_ERROR cannot be used with COPY TO"),parser_errposition(pstate, def->location)));/** Allow "stop", or "ignore" values.*/if (pg_strcasecmp(sval, "stop") == 0)return COPY_ON_ERROR_STOP;if (pg_strcasecmp(sval, "ignore") == 0)return COPY_ON_ERROR_IGNORE;ereport(ERROR,(errcode(ERRCODE_INVALID_PARAMETER_VALUE),errmsg("COPY ON_ERROR \"%s\" not recognized", sval),parser_errposition(pstate, def->location)));return COPY_ON_ERROR_STOP; /* keep compiler quiet */
}
2.测试案例
(1)创建测试表test_copy_onerror
创建一个测试表test_copy_onerror,name的类型为varchar(5),插入超过5个字符的则会报错。
postgres<17beta1>(ConnAs[postgres]:PID[22701] 2024-05-28/19:17:48)=# create table test_copy_onerror(id int,name varchar(5));
CREATE TABLE
postgres<17beta1>(ConnAs[postgres]:PID[22701] 2024-05-28/19:17:55)=# \d test_copy_onerrorTable "public.test_copy_onerror"
+--------+----------------------+-----------+----------+---------+
| Column | Type | Collation | Nullable | Default |
+--------+----------------------+-----------+----------+---------+
| id | integer | | | |
| name | character varying(5) | | | |
+--------+----------------------+-----------+----------+---------+postgres<17beta1>(ConnAs[postgres]:PID[22701] 2024-05-28/19:17:58)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)
(2)准备一些要copy到表里的数据
准备一些要copy到表里的数据,其中第三行和第五行的第二列数据都是有问题的,长度超了,不能copy到表里
[postgres@xmaster-PostgreSQL-17beta1-06 ~]$ cat testdata.txt
1 aaaaa
2 bbbbb
3 cccccc
4 ddddd
5 ffffff
测试结果1:使用copy的默认行为
结果如下,可以看出事务回滚了,没有任何一条数据插入,并且报出了在第三行数据存在问题。
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:30:24)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:30:27)=# copy test_copy_onerror from '/home/postgres/testdata.txt';
ERROR: value too long for type character varying(5)
CONTEXT: COPY test_copy_onerror, line 3, column name: "cccccc"
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:30:30)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)
测试结果2:使用ON_ERROR选项配合stop值
结果如下,可以看出事务回滚了,没有任何一条数据插入,并且报出了在第三行数据存在问题。可以看出现象和COPY的默认行为一致。
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:32:05)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:32:06)=# copy test_copy_onerror from '/home/postgres/testdata.txt' (ON_ERROR stop);
ERROR: value too long for type character varying(5)
CONTEXT: COPY test_copy_onerror, line 3, column name: "cccccc"
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:32:22)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)
测试结果3:使用ON_ERROR选项配合ignore值(错误的值是超出字段长度)
结果如下,它把报错行之前的数据写入到表里,然后提交,报错行之后的数据全部忽略,即使报错行后有正确数据的行,也不能copy进表里。。
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:34:15)=# table test_copy_onerror;
+----+------+
| id | name |
+----+------+
+----+------+
(0 rows)postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:34:17)=# copy test_copy_onerror from '/home/postgres/testdata.txt' (ON_ERROR ignore);
NOTICE: 3 rows were skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[22786] 2024-05-28/19:35:34)=# table test_copy_onerror;
+----+-------+
| id | name |
+----+-------+
| 1 | aaaaa |
| 2 | bbbbb |
+----+-------+
(2 rows)
测试结果4:使用ON_ERROR选项配合ignore值(错误的值是类型不匹配)
结果如下,错误的值是类型不匹配时,会丢弃输入行并继续处理下一行。只有报错的行没有导入到表里。
postgres<17beta1>(ConnAs[postgres]:PID[23019] 2024-05-28/20:10:14)=# create table t2 (id int);
CREATE TABLE
postgres<17beta1>(ConnAs[postgres]:PID[23050] 2024-05-28/20:15:50)=# \d t2Table "public.t2"
+--------+---------+-----------+----------+---------+
| Column | Type | Collation | Nullable | Default |
+--------+---------+-----------+----------+---------+
| id | integer | | | |
+--------+---------+-----------+----------+---------+//要导入的数据有三行,其中第二行数据和表的列数据不符
[postgres@xmaster-PostgreSQL-17beta1-06 ~]$ cat 1.txt
1
n
2[postgres@xmaster-PostgreSQL-17beta1-06 ~]$ psql
Border style is 2.
Line style is ascii.
psql (17beta1)
Type "help" for help.postgres<17beta1>(ConnAs[postgres]:PID[23083] 2024-05-28/20:16:56)=# table t2;
+----+
| id |
+----+
+----+
(0 rows)postgres<17beta1>(ConnAs[postgres]:PID[23083] 2024-05-28/20:17:02)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore);
NOTICE: 1 row was skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23083] 2024-05-28/20:17:10)=# table t2;
+----+
| id |
+----+
| 1 |
| 2 |
+----+
(2 rows)
4.总结
经过测试,对于copy from的不同错误类型,就算选择了同样的ON_ERROR ignore选项,行为也可能不一样。目前我的测试中展现出两种行为:
(1)选择ON_ERROR ignore选项,错误的值是超出字段长度
没有丢弃输入行并继续处理下一行,而是在错误行中断了。把报错行之前的数据写入到表里,然后提交,报错行之后的数据全部忽略,即使报错行后有正确数据的行,也不能copy进表里。
(2)选择ON_ERROR ignore选项,错误的值是类型不匹配
错误的值是类型不匹配时,会丢弃输入行并继续处理下一行。只有报错的行没有导入到表里。
二、pg_stat_progress_copy视图的改进
PostgreSQL 14 开始,添加了pg_stat_progress_copy视图来检查 COPY 处理的进度。
PostgreSQL17-beta1版本里,添加了一个tuples_skipped的列,由于包含格式错误的数据而被跳过的元组数。仅当为 ON_ERROR 选项指定了停止以外的值时,此计数器才会增加。
postgres<16.1>(ConnAs[postgres]:PID[53951] 2024-05-29/17:19:31)=# \d pg_stat_progress_copyView "pg_catalog.pg_stat_progress_copy"
+------------------+---------+-----------+----------+---------+
| Column | Type | Collation | Nullable | Default |
+------------------+---------+-----------+----------+---------+
| pid | integer | | | |
| datid | oid | | | |
| datname | name | | | |
| relid | oid | | | |
| command | text | | | |
| type | text | | | |
| bytes_processed | bigint | | | |
| bytes_total | bigint | | | |
| tuples_processed | bigint | | | |
| tuples_excluded | bigint | | | |
+------------------+---------+-----------+----------+---------+postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:38:19)=# \d pg_stat_progress_copyView "pg_catalog.pg_stat_progress_copy"
+------------------+---------+-----------+----------+---------+
| Column | Type | Collation | Nullable | Default |
+------------------+---------+-----------+----------+---------+
| pid | integer | | | |
| datid | oid | | | |
| datname | name | | | |
| relid | oid | | | |
| command | text | | | |
| type | text | | | |
| bytes_processed | bigint | | | |
| bytes_total | bigint | | | |
| tuples_processed | bigint | | | |
| tuples_excluded | bigint | | | |
| tuples_skipped | bigint | | | | <--增加的列
+------------------+---------+-----------+----------+---------+
三、COPY添加的LOG_VERBOSITY选项
COPY添加的LOG_VERBOSITY选项,这个选项可以报告 COPY FROM 被忽略的错误行。
verbosity可设置的值 | 作用 |
---|---|
default | 默认配置,报告跳过了多少行。 |
verbose | 冗余输出设置。对于有错误的每一行,报告哪一列有问题以及最后跳过了多少行。 |
目前verbosity可设置的值仅有两个
src/include/commands/copy.h... ...
/** Represents verbosity of logged messages by COPY command.*/
typedef enum CopyLogVerbosityChoice
{COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
} CopyLogVerbosityChoice;
... ...----------------------------------------------------------------------------------------
src/backend/commands/copy.c... ...
/** Extract a CopyLogVerbosityChoice value from a DefElem.*/
static CopyLogVerbosityChoice
defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
{char *sval;/** Allow "default", or "verbose" values.*/sval = defGetString(def);if (pg_strcasecmp(sval, "default") == 0)return COPY_LOG_VERBOSITY_DEFAULT;if (pg_strcasecmp(sval, "verbose") == 0)return COPY_LOG_VERBOSITY_VERBOSE;ereport(ERROR,(errcode(ERRCODE_INVALID_PARAMETER_VALUE),errmsg("COPY LOG_VERBOSITY \"%s\" not recognized", sval),parser_errposition(pstate, def->location)));return COPY_LOG_VERBOSITY_DEFAULT; /* keep compiler quiet */
}
... ...
使用方式和是否使用 LOG_VERBOSITY verbose的对比如下,带有LOG_VERBOSITY verbose的会把跳过的行的行号以及具体数据均打印出来。
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:05)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore);
NOTICE: 1 row was skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:06)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore,LOG_VERBOSITY default);
NOTICE: 1 row was skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:08)=# copy t2 from '/home/postgres/1.txt' (ON_ERROR ignore,LOG_VERBOSITY verbose);
NOTICE: skipping row due to data type incompatibility at line 2 for column id: "n"
NOTICE: 1 row was skipped due to data type incompatibility
COPY 2
------------------
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:19)=#
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:20)=# copy test_copy_onerror from '/home/postgres/testdata.txt' (ON_ERROR ignore);
NOTICE: 3 rows were skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:36)=# copy test_copy_onerror from '/home/postgres/testdata.txt' (ON_ERROR ignore,LOG_VERBOSITY default);
NOTICE: 3 rows were skipped due to data type incompatibility
COPY 2
postgres<17beta1>(ConnAs[postgres]:PID[23159] 2024-05-28/20:45:45)=# copy test_copy_onerror from '/home/postgres/testdata.txt' (ON_ERROR ignore,LOG_VERBOSITY verbose);
NOTICE: skipping row due to data type incompatibility at line 3 for column name: "cccccc"
NOTICE: skipping row due to data type incompatibility at line 4 for column id: "4 dddd"
NOTICE: skipping row due to data type incompatibility at line 5 for column id: "5 ffffff"
NOTICE: 3 rows were skipped due to data type incompatibility
COPY 2