原文地址:http://docs.memsql.com/docs/installation-best-practices

Installation Best Practices This section lists the best practices for
properly configuring your MemSQL installation.

安装最佳实践
这一节列出了合理配置MemSQL的最佳实践。

MemSQL Database Configuration Configure Linux ulimit settings Most
Linux operating systems provide ways to control the usage of system
resources such as threads, files and network at an individual user or
process level. The per-user limitations for resources are called
ulimits, and they prevent single users from consuming too much system
resources. For optimal performance, MemSQL recommends setting ulimits
to higher values than the default Linux settings. The ulimit settings
can be configured in the /etc/security/limits.conf file, or directly
via shell commands.

MemSQL数据库配置
配置linux的ulimit选项
很多Linux操作系统为每一个用户或者每一个进程提供了控制线程,文件和网络等系统资源的控制,每一个用户的资源控制叫做ulimit,
这个变量组织单一用户消耗太多的系统资源,为了优化性能,MemSQL建议把ulimit设置的比系统默认值高,这个变量可以在
/etc/security/limits.conf中配置,或者直接通过命令行

Increase File Descriptor Limit The MemSQL cluster uses a substantial
number of client and server connections between aggregators and leaves
to run queries and cluster operations. We recommend setting the Linux
file descriptor limit to the highest possible value (at least 64,000)
to account for these connections. Failing to increase this limit can
significantly degrade performance and even cause connection limit
errors. Permanently increase this limit for all users by editing the
/etc/security/limits.conf file as root, and adding the lines: Shell *
soft NOFILE 1000000
* hard NOFILE 1000000 Alternatively, you can set the value for your session by running the following command in your shell: Shell ulimit
-n 1000000 For more information about setting the file descriptor limit on Linux, see:
http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files.

增加文件描述符限制
MemSQL集群使用大量的数字用于在aggregator和leaf的客户端和服务器的连接来运行查询和集群操作,我们建议为了连接性能考虑设置linux文件描述符最可能的大
(至少64000),不能增加这个限制会明显的降低性能,甚至导致连接失败。
永久的增加这个限制可以通过使用root角色登录来编辑/etc/security/limits.conf,增加一行:
* soft NOFILE 1000000
* hard NOFILE 1000000
另外,如果你愿意的话,你可以在命令行中设置你本次会话的值:
ulimit -n 1000000
http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files

Increase Maximum Process Limit MemSQL also recommends increasing the
number of processes allowed to be run by users. Permanently increase
this limit for all users by editing the /etc/security/limits.conf file
as root, and adding the lines: Shell
* soft NPROC 128000
* hard NPROC 128000 Alternatively, you can set the value for your session by running the following command in your shell: Shell ulimit
-u 128000

增加最大进程的限制
MemSQL也建议增加用户可以运行的进程数,永久的增加进程可以root身份修改/etc/security/limits.conf,增加:
* soft NPROC 128000
* hard NPROC 128000
也可以通过命令行设置本次会话的值
ulimit -u 128000

Configure Linux vm settings MemSQL recommends setting the following
settings via sysctl to minimize likelihood of getting memory errors.
You can use the /sbin/sysctl command to view, set and automate
settings in the /proc/sys/ directory. Permanently set these variables
by editing the /etc/sysctl.conf file as root, and adding these lines:
Shell vm.max_map_count=1000000000 vm.min_free_kbytes=500000
vm.swappiness=10 If MemSQL Ops is used to configure vm setting
defaults, it will set vm.min_free_kbytes to the minimum of 1% of
system RAM and 4 GB.

设置linux的vm
MemSQL建议通过“sysctl”设置将内存出错的可能性降到最小,你可以使用/sbin/sysctl命令去查看,设置和在/proc/sys/中自动化设置。
通过以root身份编辑/etc/sysctl.conf永久的设置变量,增加:
vm.max_map_count=1000000000
vm.min_free_kbytes=500000
vm.swappiness=10
如果使用MemSQL ops的默认设置,它将要设置vm.min_free_kbytes为系统RAM和4GB最小值的1%

Create swap files It is recommended that users explicity configure
swap files. It is recommended that you create a swap file so that the
operating system does not immediately start killing processes if
MemSQL runs out of memory. Here is a sample shell script to create the
swap file: Shell snapshot_dir=”/var/lib/memsql/data/snapshots”
snapshot_file=” snapshotdir/memsql.swp"mkdirp snapshot_dir dd
if=/dev/zero of= snapshotfilebs=1Mcount=10240chmod600 snapshot_file mkswap snapshotfileswapon snapshot_file echo
“$snapshot_file swap swap defaults 0 0” | tee -a /etc/fstab For more
information on configuring swap files, see:
http://www.cyberciti.biz/faq/linux-add-a-swap-file-howto/

创建交换文件
建议用户自己配置交换文件
建议你创建一个交换文件,这样操作系统不会再MemSQL用光内存的时候立刻杀掉进程,这有一个创建交互文件的例子:
snapshot_dir=”/var/lib/memsql/data/snapshots”
snapshot_file=” snapshotdir/memsql.swpmkdirp snapshot_dir
dd if=/dev/zero of= snapshotfilebs=1Mcount=10240chmod600 snapshot_file
mkswap snapshotfileswapon snapshot_file
echo “$snapshot_file swap swap defaults 0 0” | tee -a /etc/fstab
http://www.cyberciti.biz/faq/linux-add-a-swap-file-howto/

Ensure port 3306 is available on every cluster host The default port
used by MemSQL is 3306, which is configurable. For the smoothest user
experience, make sure this port is accessible and unoccupied. If
another process is already running on the system and occupying port
3306, you will see an error in MemSQL similar to: Shell 120501
3:04:15 [ERROR] Can’t start server: Bind on TCP/IP port: Address
already in use 120501 3:04:15 [ERROR] Do you already have another
mysqld server running on port: 3306 ? 120501 3:04:15 [ERROR] Aborting
To workaround this, update the MemSQL port value in the memsql.cnf
file to an available port. Thereafter, connect to the database using
the new port, i.e. mysql -u root -h 127.0.0.1 -P
–prompt=”memsql> “.

确认3306在集群的每一台机子上都没有被占用
MemSQL的默认端口是3306,这个是可以配置的,为了好的体验,请确认这个端口是可访问的并且没有被占用。如果端口被占用。你会看到与下面相似的报错:
120501 3:04:15 [ERROR] Can’t start server: Bind on TCP/IP port: Address already in use
120501 3:04:15 [ERROR] Do you already have another mysqld server running on port: 3306 ?
120501 3:04:15 [ERROR] Aborting
为了解决这问题,在memsql.cnf中设置成可用的端口号,从这以后,用新的端口号连接数据库 mysql -u root -h 127.0.0.1 -P –prompt=”memsql> “

Recommendations for Optimal On-Premise Columnstore Performance We
recommend the P3600/P3700 Intel SSDs. Update the devices to the latest
firmware using the Intel SSD Data Center Tool. We support the EXT4
filesystem using the discard and noatime mount options. We currently
do not support XFS. Many improvements have been made recently in Linux
for NVMe devices, so we recommend using a 3.0+ series kernel. For
example, CentOS 7.2 uses the 3.10 kernel. Set the following parameters
in Linux (make it permanent in /etc/rc.local): Shell # Set
DEVICENUMBERforeachdeviceecho0>/sys/block/nvme {DEVICE_NUMBER}n1/queue/add_random echo 1 >
/sys/block/nvme DEVICENUMBERn1/queue/rqaffinityechonone>/sys/block/nvme {DEVICE_NUMBER}n1/queue/scheduler echo 1023 >
/sys/block/nvme${DEVICE_NUMBER}n1/queue/nr_requests

优化列式存储的建议
我们建议 P3600/P3700的Intel SSD,在https://downloadcenter.intel.com/download/23931/Intel-SSD-Data-Center-Tool中更新最新的固件,我们支持EXT4文件系统,
使用discard和noatime选项,我们目前不支持XFS。
在NVMe设备上的linux做了很多的提高,所以我们建议用至少3.0以上的核
在linux下设置以下参数(在/etc/rc.local中永久设置):

Set ${DEVICE_NUMBER} for each device

echo 0 > /sys/block/nvme DEVICENUMBERn1/queue/addrandomecho1>/sys/block/nvme {DEVICE_NUMBER}n1/queue/rq_affinity
echo none > /sys/block/nvme DEVICENUMBERn1/queue/schedulerecho1023>/sys/block/nvme {DEVICE_NUMBER}n1/queue/nr_requests

Disable requiretty For the smoothest user experience of automatically
provisioning the MemSQL cluster, MemSQL Ops needs the ability to SSH
into other hosts outside an interactive shell/session. As such, it
requires that requiretty is disabled. This can be done by modifying
the /etc/sudoers file and commenting out any lines that reference
requiretty.

设置requirtty为disable
为了更好的用户体验,MemSQL Ops需要SSH连接到其他的主机,所以需要设置requiretty为disable,可以在/etc/sudoers中设置

Ensure working SSH between cluster hosts For MemSQL Agent to
successfully download and install itself in other hosts in the
cluster, it needs to connect via SSH from the primary host (where the
primary MemSQL Agent resides) to other hosts in the cluster. For the
best user experience, check to make sure SSH is possible between the
primary host and the other hosts in the cluster. If this is not
possible, you will need to install the MemSQL Agent manually in all
cluster hosts.

确认各个集群主机之间的SSH连接
为了MemSQL Agent能够成功的在集群上的其他节点下载和安装他自己,他需要通过SSH链接原始主机到其他的主机,为了最好的用户体验,确认二者之间的SSH是有链接的,
如果没有连接,你需要手动的在所有主机上安装MemSQL Agent

Start a new login shell after running install.sh The install.sh script
creates a memsql user and memsql group within the Linux machine, and
adds the user running install.sh to the memsql group. To successfully
complete the addition of your user into the memsql group, you will
need to restart your shell.

在运行install.sh之后运行一个新的登录窗口
install.sh脚本在linux机器上创建了一个memsql用户和memsql组,把用户增加到memsql组,为了成功的把当前用户加到memsql组,你需要重新启动你的shell窗口。

Create the same Linux user with sudo permissions on every cluster host
Setting up MemSQL using the MemSQL Agent is easiest if the user and
password is the same for every node. The MemSQL Agent will use this
user to connect to all hosts in the cluster and install / deploy
MemSQL.

在每一个集群主机上使用sudo权限创建一样的linux用户
如果每个节点的用户名和密码是一样的使用MemSQL Agent创建集群是最简单的,MemSQL将要使用这个用户链接到所有主机去安装部署MemSQL。

Ensure port 9000 is acessible on every cluster host The default port
opened by the MemSQL Ops is 9000, which is configurable. For the
smoothest user experience, open this port on every host, and ensure it
is accessible via a browser. For deployments on public cloud platforms
(e.g. AWS, Azure), make sure the proper security group is configured
for your cluster hosts that allow public access to port 9000. If it is
not possible to open port 9000, MemSQL Ops agents can be started on
other ports using the –port flag in memsql-ops start.

确认每个集群主机的9000端口是可以访问的
MemSQL Ops默认的端口是9000,这是可配置的,为了最好的用户体验,在每一个主机上开启这个端口,确信他可以通过浏览器访问,为了在公共云平台上安装,确认为你的
集群配置了合适的安全组,可以访问9000端口。如果9000端口被占用,使用memsql-ops –port再启动memsql-ops时候设置端口

Install numactl on machines with multiple sockets For optimal
performance, MemSQL Ops automatically detects if a machine has
multiple sockets and recommends that MemSQL be deployed in a
NUMA-aware manner. Specifically, MemSQL Ops will run numactl commands
to bind individual MemSQL nodes to CPUs. This allows faster access to
in-memory data, since individual MemSQL nodes only access data that’s
collocated with their corresponding CPU. For MemSQL Ops to enable
NUMA, you need to install the numactl package in your Linux host: sudo
apt-get install numactl

使用多个socket在机器上安装numactl
为了更好的性能,MemSQL Ops自动检测机器上是否有多个MemSQL并建议以NUMA-aware方式来部署MemSQL,特殊说明的,MemSQL Ops将要运行numactl命令来绑定MemSQL 节点
和CPU,这允许更快地访问内存数据,因为独立的MemSQL及诶电脑只能访问他们对应CPU的数据,为了让MemSQL Ops允许NUMA,你需要安装numactl包:
sudo apt-get install numactl

Ensure /tmp has free space or change TMPDIR MemSQL Ops, like many Unix
utilities, writes temporary data to /tmp and requires available free
space. It is possible to change the temporary directory by setting the
canonical Unix environment variable TMPDIR.

确认/tmp有空间或者改变TMPDIR
MemSQL Ops,向很多unix工具,向tmp文件夹写入临时数据并需要剩余空间,可以通过设置unix的环境变量TMPDIR来设置文件目录

Setting Default Functionality
bind-address 0.0.0.0 如果地址是0.0.0.0, 试图连接所有的网络接口, 否则它只连接指定IP地址接口.
flush_before_replicate OFF 如果被设置, 数据被发送到从节点之前先被刷新到本地磁盘. 它将要增加复制的时延,但是保证从节点不会比主节点先复制数据
master_aggregator no default value 不设置的时候本机就是 master aggregator, 否则设置成host:port就是master aggregator的地址
maximum_memory 90% of System RAM MemSQL可以使用的最大内存 (MB). 默认格式是所有机器内存的百分数
maximum_table_memory 90% of maximum_memory 存储表格的内存,默认是上一次参数的百分数.
port 3306 用于连接的端口号.
port_open_timeout 0 得知服务器端口号被占用之后等待的时间(0代表着不等待).
redundancy_level 1 如果设置成1,没有复制. 如果设置成2, 打开 MemSQL’s :doc:High Availibility Mode .
reported_hostname no default value 本机的IP地址或者主机名,用于和其他的机器通信.
snapshot_trigger_size 268435456 byte 日志达到什么大小应该开启一个新镜像.
snapshots_to_keep 2 应该保留几个镜像和日志文件用于备份和复制.
user, u current user 作为user运行memsqld.
datadir ./data 数据目录,这个目录包括镜像,日志和列式存储
plancachedir ./plancache Plancache路径. 这个路径包含代码生成的编译计划路径.
tracelogsdir ./tracelogs Tracelogs路径. 这个目录包括日志文件, 包括memsql.log 和查询日志.
Cluster Management Settings
Setting Default Functionality
aggregator_failure_detection ON Aggregators是否应该检测其他aggregators的失败.
auto_attach ON 如果aggregator的重复设置成 1, 它死之后是不是应该自动连接其他节点
distributed_heartbeat_timeout 10 分布式的查询超时.
leaf_failure_detection ON master aggregator是否应该检测叶子节点的失败.

Connection Management Settings
Setting Default Functionality
connect_timeout 10 memsqld服务器等待一个连接的秒数.
interactive_timeout 28800 服务器等待应用程序多少秒不反应就关闭连接.
max_allowed_packet 104857600 协议包的最大大小.
max_connect_errors 10 如果一个主机打断的连接超过这个数量,这个主机将来的连接将会被阻塞.
max_connections 151 同时在线的客户端的数量.
max_connection_threads 8192 处理连接的最大内核线程数量.
max_pooled_connections 1024 每个叶子存储的最多的连接.
skip_name_resolve AUTO 是否实施域名解析.OFF, ON, or AUTO. 默认的AUTO在有根据主机的安全策略的时候将要实施反向DNS查找.
sync_slave_timeout 10000 以milliseconds为单位设置 master等待同步复制的slave的确认反馈.
wait_timeout 28800 以seconds为单位服务器在这段时间内等待不到连接就关闭.

Database Optimization Settings
Setting Default Functionality
buffered_rows 1024 分布式join中缓存的行数量.
columnstore_disk_insert_threshold 0.5 到了这个阀值( columnar_segment_rows的分数), 多个插入 columnstore 将要被直接写入磁盘.
columnar_segment_rows 102400 一个段文件的最大行数.
columnstore_window_size 2147483648 从节点保存重复的列式存储数据的总大小.更大的大小将要防止从节点上重复的列式存储的表失去同步性.
default_partitions_per_leaf 8 每个叶子节点的默认分区数量.
load_data_read_size 8192 LOAD DATA一次读的byte.
load_data_write_size 8192 LOAD DATA一次写的byte.
lock_wait_timeout 60 以秒为单位返回等待row lock的最长事件,等不到返回错误.
max_prepared_stmt_count 16382 同时prepared statements的最大数量.
multi_insert_tuple_count 1000 aggregators在多插入下发送到叶子节点的元组数量
net_first_packet_read_timeout 30 以second为单位设置等待一个空转的阻塞了DDL操作事务的时间.
net_read_timeout 3600 以秒为单位在设置的时间内从一个连接没有更多的数据过来就放弃这个连接.
net_write_timeout 3600 以秒为单位设置在指定时间内一个连接没有写入数据就放弃写.
optimize_columnar_tables ON 在后台线程优化列式存储.
plan_expiration_minutes 720 一个查询计划存活的时间.
query_parallelism 0 并发运行查询的最大数量.
recovery_batch_size 5 从磁盘中恢复每次读取多少行数据.
replication_timeout_ms 60000 重复超时的milliseconds数
rows_per_period 1 每一期SimpleStreamingIterator 的拉出的行数量.
transaction_buffer 67108864 MemSQL内存中的事务缓存的大小.

Geospatial Settings
Setting Default Functionality
geo_sphere_radius 6367444.657120 半球半径大小,一般用于距离的计算.

Logging Settings
Setting Default Functionality
core_file ON 在崩溃的时候生成完整的核心缓存.
critical_diagnostics ON 向MemSQL发送用法和关键的错误诊断.
debug_mode OFF 代码生成阶段产生debug信息.
general_log OFF 把日志连接和查询放到表或者日志文件,OFF – 不打印日志, ON – 打印为一个日志, PARTIAL- 负载不大的时候打印日志
general_log_file /var/lib/memsql/tracelogs/query.log 日志连接和查询放在指定文件.
warn_level WARNINGS 不常见的语法支持:ERRORS,WARNINGS 或者EXPERIMENTAL.

Security Settings
Setting Default Functionality
ssl_ca none CA文件,用于SSL.
ssl_capath none CA目录,用于SSL.
ssl_cert none 证书文件,用于SSL.
ssl_cipher none 密码,用于SSL.
ssl_key none 公钥,私钥对,用于SSL.

Logo

更多推荐