基于nodejs的用来实现mongodb和ElasticSearch之间的数据实时同步 (支持附件同步)
支持一对一,一对多的数据传输方式.
英文文档 - English Documentation
elasticsearch: v6.1.2
mongodb: v3.6.2
Nodejs: v8.9.3
node-mongodb-es-connector是用来保持你的mongoDB collections和你的elasticsearch index之间的数据实时同步.它是用mongo oplog来监听你的mongdb数据是否发生变化,无论是增删改查它都会及时反映到你的elasticsearch index上.在使用本工具之前你必须保证你的mongoDB是符合replica结构的,如果不是请先正确设置之后再使用此工具.(支持附件同步)
npm install es-mongodb-sync
或者从GitHub上去下载.
创建在crawlerData文件目录下创建一个js文件,命名规则如下:
ElasticSearchIndexName.json
,或者任意名称.json
..
如果你需要更多的配置文件需要在crawlerData
目录下创建.
例子:
mybooks.json
{
"mongodb": {
"m_database": "myTest",
"m_collectionname": "books",
"m_filterfilds": {
"version" : "2.0"
},
"m_returnfilds": {
"bName": 1,
"bPrice": 1,
"bImgSrc": 1
},
"m_extendfilds": {
"bA": "this is a extend fild bA",
"bB": "this is a extend fild bB"
},
"m_extendinit": {
"m_comparefild": "_id",
"m_comparefildType": "ObjectId",
"m_startFrom": "2018-07-20 13:44:00",
"m_endTo": "2018-07-20 13:46:59"
},
"m_connection": {
"m_servers": [
"localhost:29031",
"localhost:29032",
"localhost:29033"
],
"m_authentication": {
"username": "UserAdmin",
"password": "pass1234",
"authsource":"admin",
"replicaset":"my_replica",
"ssl":false
}
},
"m_documentsinbatch": 5000,
"m_delaytime": 1000,
"max_attachment_size":5242880
},
"elasticsearch": {
"e_index": "mybooks",
"e_type": "books",
"e_connection": {
"e_server": "http://localhost1:9200,http://localhost2:9200,http://localhost3:9200",
"e_httpauth": {
"username": "EsAdmin",
"password": "pass1234"
}
},
"e_pipeline": "mypipeline",
"e_iscontainattachment": true
}
}
null
). (必须)null
). (必须)null
). (可选)null
). (可选)
_id
或者是其他字段). (可选)ObjectId
或者是DateTime
). (可选)null
). (必须)
admin
. (必须)false
). (可选)m_connection
节点(二选一). (可选)1000
ms). (必须)5242880
byte. (可选)null
). (可选)
false
). (可选)node app.js
index.js (只用来做配置文件的增删改查)
1.start() - must start up before all the APIs.
2.addWatcher() - 增加一个配置文件.
传参:
Name | Type |
---|---|
fileName | string |
obj | jsonObject |
返回值: true or false
3.updateWatcher() - 修改一个配置文件.
传参:
Name | Type |
---|---|
fileName | string |
obj | jsonObject |
返回值: true or false
4.deleteWatcher() - 删除一个配置文件.
传参:
Name | Type |
---|---|
fileName | string |
返回值: true or false
5.isExistWatcher() - 检查当前配置文件是否存在.
传参:
Name | Type |
---|---|
fileName | string |
返回值: true or false
6.getInfoArray() - 获取每个配置文件的当前状态.(waiting/initialling/running/stoped).
getInfoArray()
).m_extendfilds
节点和 m_extendinit
节点.安装附件处理器插件
https://www.elastic.co/guide/en/elasticsearch/plugins/6.3/ingest-attachment.html
更多关于 Elasticsearch Pipeline 相关的知识: https://hacpai.com/article/1512990272091
准备在elasticsearch中创建一个pipeline
PUT _ingest/pipeline/mypipeline
{
"description" : "Extract attachment information from arrays",
"processors" : [
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"target_field": "_ingest._value.attachment",
"field": "_ingest._value.data"
}
}
}
}
]
}
The MIT License (MIT). Please see LICENSE for more information.