Elasticsearch同步mysql(logstash-input-jdbc)和一些查询问题

linux环境下：安装logstash:1.下载公共密钥rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch2.添加yum源vim/etc/yum.repos.d/logstash.repo文件中写入:[logstash-5.x]name=Elastic repository for 5.x p

ZJL-阿友

3072人浏览 · 2017-09-07 16:24:28

ZJL-阿友 · 2017-09-07 16:24:28 发布

linux环境下：

安装logstash:
1.下载公共密钥

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

2.添加yum源

vim  /etc/yum.repos.d/logstash.repo

文件中写入:

[logstash-5.x]
name=Elastic repository for 5.x packages
baseurl=https://artifacts.elastic.co/packages/5.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

3.使用yum安装

yum install logstash

4.验证是否安装成功
进入 logstash 安装目录

cd /usr/share/logstash

运行

bin/logstash -e 'input { stdin { } } output { stdout {} }'

等待几秒钟出现

The stdin plugin is now waiting for input:

然后输入
hello world

看到出现输入内容为成功

安装logstash-input-jdbc插件:

1.修改ruby仓库镜像
如果没有安装 gem 的话安装gem

yum install gem

替换国内的镜像

gem sources --add https://gems.ruby-china.org/ --remove https://rubygems.org/

验证是否成功

gem sources -l

出现上面的url为成功

修改Gemfile的数据源地址：

whereis logstash # 查看logstash安装的位置， 默认在 /usr/share/logstash目录

cd /usr/share/logstash

vim Gemfile

修改 source 的值为： "https://gems.ruby-china.org/"

vim  Gemfile.jruby-1.9.lock # 找到 remote 修改它的值为：https://gems.ruby-china.org/

然后开始安装

bin/logstash-plugin  install logstash-input-jdbc

安装过程没有进度条，所以不要以为一直卡着，我之前以为一直卡着手动停止一次

2.开始同步 mysql 数据

需要的文件有：一个 .conf文件， X个 .sql 文件(X>=0，可以不需要)

去mysql官网下载一个 mysql 的Java 驱动包： mysql-connector-java-5.1.44-bin.jar

下面是导入多张表的.conf配置文件：

input {
    stdin {
    }
    jdbc {
      # 需要连接的数据库
      jdbc_connection_string => "jdbc:mysql://xxx.xxx.xxx.xxx:3306/dbname"
      jdbc_user => "root"
      jdbc_password => "xxxxx"
      # jdbc驱动所在的路径
      jdbc_driver_library => "mysql-connector-java-5.1.44-bin.jar"
      # 默认
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      # 默认
      jdbc_paging_enabled => "true"
      # 默认
      jdbc_page_size => "50000"
      # 需要执行的sql文件
      statement_filepath => "estest1.sql"
      # statement => "这样可以直接写sql语句而不用sql文件，适合短sql"
      schedule => "* * * * *"
      # 这个type有用，但是如果你的表中有type字段，并且你需要这个字段，要么sql中用as重命名，要么这里的type改名字
      type => "a_data"
    }
    jdbc {
      jdbc_connection_string => "jdbc:mysql://xxx.xxx.xxx.xxx:3306/dbname"
      jdbc_user => "root"
      jdbc_password => "xxxx"
      jdbc_driver_library => "mysql-connector-java-5.1.44-bin.jar"
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      jdbc_paging_enabled => "true"
      jdbc_page_size => "50000"
      statement_filepath => "esztest2.sql"
      schedule => "* * * * *"
      type => "b_data"
    }
    jdbc {
      jdbc_connection_string => "jdbc:mysql://xxx.xxx.xxx.xxx:3306/dbname"
      jdbc_user => "root"
      jdbc_password => "xxxx"
      jdbc_driver_library => "mysql-connector-java-5.1.44-bin.jar"
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      jdbc_paging_enabled => "true"
      jdbc_page_size => "50000"
      statement_filepath => "estest3.sql"
      schedule => "* * * * *"
      type => "c_data"
    }
}

output {
    # 通过上面定义的type来区分
    if[type] == "a_data"{
        elasticsearch {
        hosts  => "xxx.xxx.xxx.xxx:9200"
        # 索引
        index => "estest"
        # 文档type
        document_type => "a_data"
        # 文档id，这个是将sql中的id字段当作文档id，如果sql中没有id找一个唯一值字段as成id
        document_id => "%{id}"
        }
    }
    if[type] == "b_data"{
        elasticsearch {
        hosts  => "xxx.xxx.xxx.xxx:9200"
        index => "estest"
        document_type => "b_data"
        document_id => "%{id}"
        }
    }
    if[type] == "exit_data"{
        elasticsearch {
        hosts  => "xxx.xxx.xxx.xxx:9200"
        index => "estest"
        document_type => "c_data"
        document_id => "%{id}"
        }
    }
    # 控制台输出内容
    stdout {
        codec => json_lines
    }
}

这样就同步四张表

sql文件就按各自需求写

SELECT * FROM xxx WHERE update_time> :sql_last_value

可以通过update_time这段进行增量同步(也可以通过唯一id)，如果没有where就全量同步

在es查询中出现一个问题，至今没有找到原因，搜索长的long数据无法搜索到，但是一两位的long数据却可以搜素到，这个很无解，我只能在同步时将mysql的数字类型通过CONVERT函数进行类型转换

SELECT CONVERT(e.`xx_id`,CHAR) as xx_id, FROM xxx e WHERE update_time> :sql_last_value

这样进入es中的数据都是字符串

有时候将es取代mysql复杂查询，sql中有类似(a or b) and (c or d or e or f) and g 这样的判断语句

es的查询如下：

{
    "query": {
        "bool": {
        	# must是完全匹配，相当于AND
            "must": [
                {
                    "match": {
                        "g": "1111"
                    }
                },
                {
                    "bool": {
                        # should 相当于OR
                        "should": [
                            {
                                "match": {
                                    "a": "1789104"
                                }
                            },
                            {
                                "match": {
                                    "b": "1789104"
                                }
                            }
                        ]
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "match": {
                                    "c": "有限公司"
                                }
                            },
                            {
                                "match": {
                                    "d": "有限公司"
                                }
                            },
                            {
                                "match": {
                                    "e": "有限公司"
                                }
                            },
                            {
                                "match": {
                                    "f": "有限公司"
                                }
                            }
                        ]
                    }
                }
            ],
            # must_not 不能匹配
            "must_not": [],
            "should": []
        }
    },
    # 起始数据
    "from": 0,
    # 结尾数据
    "size": 20,
    "sort": [],
    "aggs": {}
}

这条查询语句就是sql的(a or b) and (c or d or e or f) and g

就是通过must(AND)，should(OR)，bool包裹的组合来实现复杂的匹配查询

es搜索中碰到无法确定关键字是中英文还是数字，但是要做到相对精准的匹配，可以使用通配符或者正则(正则我没有用过不清楚，通配符在字母数字或者两者组合有效)

下面是一个(a or b) and (c or d）的匹配，其中c用到了“wildcard“这个关键字是用于通配符模式，这里有点要注意的，因为ES内部的机制，即使head中看到的数据是大写字母，但是用大写字母是匹配是匹配不到的，只有用小写才可以，所以为了用户体验好点，可以将用户输入的字母都转成小写再匹配

{
    "query": {
        "bool": {
            "must": [
                {
                    "bool": {
                        "should": [
                            {
                                "match": {
                                    "a": "18396893"
                                }
                            },
                            {
                                "match": {
                                    "b": "18396893"
                                }
                            }
                        ]
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "wildcard": {
                                    "c": "*3zz*"
                                }
                            },
                            {
                                "match": {
                                    "d": "项目名称"
                                }
                            }
                        ]
                    }
                }
            ],
            "must_not": [],
            "should": []
        }
    },
    "from": 0,
    "size": 20,
    "sort": [],
    "aggs": {}
}