首页 > 数据库 > PostgreSQL > PostgreSQL 源码解读(168)- 查询#88(PG中的词法定义:scanner.l)#1

PostgreSQL 源码解读(168)- 查询#88(PG中的词法定义:scanner.l)#1

原创 PostgreSQL 作者:husthxd 时间:2019-04-15 15:23:06 0 删除 编辑


User subroutines




 * scan.l
 *    lexical scanner for PostgreSQL
 *    PostgreSQL的词法扫描器  
 * 特别特别特别注意:
 * The rules in this file must be kept in sync with src/fe_utils/psqlscan.l!
 * 这个文件中的规则必须与src/fe_utils/psqlscan.l文件中的规则保持一致!!! 
 * The rules are designed so that the scanner never has to backtrack,
 * in the sense that there is always a rule that can match the input
 * consumed so far (the rule action may internally throw back some input
 * with yyless(), however).  As explained in the flex manual, this makes
 * for a useful speed increase --- about a third faster than a plain -CF
 * lexer, in simple testing.  The extra complexity is mostly in the rules
 * for handling float numbers and continued string literals.  If you change
 * the lexical rules, verify that you haven't broken the no-backtrack
 * property by running flex with the "-b" option and checking that the
 * resulting "lex.backup" file says that no backing up is needed.  (As of
 * Postgres 9.2, this check is made automatically by the Makefile.)
 * 之所以设计这一的规则是便于扫描器不需要回溯,确保对于输入一定有一条规则与其匹配
 * (但是,规则动作可能在内部用yyless() throw back一些输入).
 * 正如Flex手册中所说明的,这可以提升性能 -- 
 *   在简单测试的情况下,相对于普通的-CF词法分析器,大概有1/3的性能提升.
 * 额外的复杂性主要体现在处理浮点数和连续字符串文字的规则中.
 * 如果修改了词法规则,通过以-b选项执行Flex以确保没有打破无回溯的约定,
 *   并且坚持结果文件"lex.backup"以确认无需备份.
 * (在PG 9.2,该检查通过Makefile自动执行)
 * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *    src/backend/parser/scan.l
#include "postgres.h"
#include <ctype.h>
#include <unistd.h>
#include "common/string.h"
#include "parser/gramparse.h"
#include "parser/parser.h"      /* only needed for GUC variables */
#include "parser/scansup.h"
#include "mb/pg_wchar.h"
//------------------ 声明部分
/* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
#undef fprintf
#define fprintf(file, fmt, msg)  fprintf_to_ereport(fmt, msg)
static void
fprintf_to_ereport(const char *fmt, const char *msg)
    ereport(ERROR, (errmsg_internal("%s", msg)));
 * GUC variables.  This is a DIRECT violation of the warning given at the
 * head of gram.y, ie flex/bison code must not depend on any GUC variables;
 * as such, changing their values can induce very unintuitive behavior.
 * But we shall have to live with it until we can remove these variables.
 * GUC参数变量.这直接违反了gram.y中提出的约定,如flex/bison代码不能依赖GUC变量;
 * 因此,改变他们的值会导致未知的后果.
 * 但在去掉这些变量前,不得不"活下去"
int         backslash_quote = BACKSLASH_QUOTE_SAFE_ENCODING;
bool        escape_string_warning = true;
bool        standard_conforming_strings = true;
 * Set the type of YYSTYPE.
 * 在Bison中,全局变量yylval的类型为YYSTYPE,默认为int
 * Internally, bison declares each value as a C union that includes all of the types. 
 * You list all of the types in %union declarations. 
 * Bison turns them into a typedef for a union type called YYSTYPE.
#define YYSTYPE core_YYSTYPE
 * Set the type of yyextra.  All state variables used by the scanner should
 * be in yyextra, *not* statically allocated.
 * 设置yyextra的数据类型.所有扫描器使用的状态变量应在yyextra中,不是静态分配的.
#define YY_EXTRA_TYPE core_yy_extra_type *
 * Each call to yylex must set yylloc to the location of the found token
 * (expressed as a byte offset from the start of the input text).
 * When we parse a token that requires multiple lexer rules to process,
 * this should be done in the first such rule, else yylloc will point
 * into the middle of the token.
 * 每一次调用yylex必须设置yylloc指向发现的token所在的位置.
 * (从输入文本开始计算的字节偏移量)
 * 在分析一个需要多个词法规则进行处理的token时,
 *   在第一次应用规则时就应该完成这个动作,否则的话yylloc会指向到token的中间位置.
#define SET_YYLLOC()  (*(yylloc) = yytext - yyextra->scanbuf)
 * Advance yylloc by the given number of bytes.
 * 通过给定的字节数调整yylloc的位置
#define ADVANCE_YYLLOC(delta)  ( *(yylloc) += (delta) )
#define startlit()  ( yyextra->literallen = 0 )
static void addlit(char *ytext, int yleng, core_yyscan_t yyscanner);
static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
static char *litbufdup(core_yyscan_t yyscanner);
static char *litbuf_udeescape(unsigned char escape, core_yyscan_t yyscanner);
static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
static int  process_integer_literal(const char *token, YYSTYPE *lval);
static bool is_utf16_surrogate_first(pg_wchar c);
static bool is_utf16_surrogate_second(pg_wchar c);
static pg_wchar surrogate_pair_to_codepoint(pg_wchar first, pg_wchar second);
static void addunicode(pg_wchar c, yyscan_t yyscanner);
static bool check_uescapechar(unsigned char escape);
#define yyerror(msg)  scanner_yyerror(msg, yyscanner)
#define lexer_errposition()  scanner_errposition(*(yylloc), yyscanner)
static void check_string_escape_warning(unsigned char ychar, core_yyscan_t yyscanner);
static void check_escape_warning(core_yyscan_t yyscanner);
 * Work around a bug in flex 2.5.35: it emits a couple of functions that
 * it forgets to emit declarations for.  Since we use -Wmissing-prototypes,
 * this would cause warnings.  Providing our own declarations should be
 * harmless even when the bug gets fixed.
 * Flex 2.5.35存在一个bug:忽略了函数但没有忽略函数声明.
 * 因为使用了-Wmissing-prototypes选项,这会导致警告出现.
 * 就算bug修复,提供PG的声明也可能会存在问题.
extern int  core_yyget_column(yyscan_t yyscanner);
extern void core_yyset_column(int column_no, yyscan_t yyscanner);



来自 “ ITPUB博客 ” ,链接:,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录


  • 博文量
  • 访问量