失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > rancher部署flink集群

rancher部署flink集群

时间:2023-01-04 02:02:29

相关推荐

rancher部署flink集群

rancher版本:v2.6.8

k8s版本:v1.22.13+rke2r1

flink集群版本:1.15.0

flink安装模式:session cluster

写在前面:因为参照官网的说明安装过程中出现了很多问题,特记录于此,避免后续重复踩坑

目录

一.flink官网docker的安装

1.安装前配置

2.安装jobmanager

3.安装taskmanager

二.rancher部署flink集群

1.新建jobmanager服务

3.验证集群是否搭建完成

一.flink官网docker的安装

1.安装前配置

$ FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager"$ docker network create flink-network

2.安装jobmanager

$ docker run \--rm \--name=jobmanager \--network flink-network \--publish 8081:8081 \--env FLINK_PROPERTIES="${FLINK_PROPERTIES}" \flink:1.15.4-scala_2.12 jobmanager

3.安装taskmanager

$ docker run \--rm \--name=taskmanager \--network flink-network \--env FLINK_PROPERTIES="${FLINK_PROPERTIES}" \flink:1.15.4-scala_2.12 taskmanager

然后进入flinkUI首页地址就可以了:localhost:8081.

二.rancher部署flink集群

1.新建jobmanager服务

服务名称取做:flink-jobmanager-one

镜像版本使用:flink:1.15.0

配置端口

集群IP:6123(集群间通讯使用)

节点端口:8081:30099(暴露出的flinkUI端口,必须大于30000)

配置预设参数和环境变量

预设参数:jobmanager(表示当前服务为jobmanager,对应的是taskmanager)

环境变量:挂载的flink-conf.xml文件

flink-conf.xml详细内容:

parallelism.default: 1rest.address: 0.0.0.0rest.bind-address: 0.0.0.0blob.server.port: 6124query.server.port: 6125taskmanager.bind-host: 0.0.0.0taskmanager.numberOfTaskSlots: 100taskmanager.memory.process.size: 34560mjobmanager.rpc.port: 6123jobmanager.bind-host: 0.0.0.0jobmanager.rpc.address: flink-jobmanager-onejobmanager.execution.failover-strategy: regionjobmanager.memory.heap.size: 16000mjobmanager.memory.jvm-metaspace.size: 1600mjobmanager.memory.jvm-overhead.max: 20480mjobmanager.memory.jvm-overhead.min: 1mjobmanager.memory.off-heap.size: 1600mjobmanager.memory.process.size: 20480mkubernetes.cluster-id: defaulthigh-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactoryhigh-availability.storageDir: file:///opt/flink/ha-storage

详细的挂载配置如下图,键必须是:FLINK_PROPERTIES,值则是整个flink安装目录下的flink-conf.xml的内容,为了便于阅读删除了配置文件中的无用注释。

另外,需要挂载flink的lib目录下jar包,以及高可用模式下需要挂载高可用存储路径,我才用的是NFS挂载,如下图(为避免麻烦打了马赛克)详细地址根据你自己的设置来

jobmanager就到此配置完毕了,点击保存即可启动服务

查看服务启动日志,为如下状态,且无其它报错,代表启动正常

-04-21 03:56:14,800 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Starting StandaloneSessionClusterEntrypoint.-04-21 03:56:14,860 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Install default filesystem.-04-21 03:56:14,865 INFO org.apache.flink.core.fs.FileSystem [] - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.-04-21 03:56:14,947 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Install security context.-04-21 03:56:14,967 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.-04-21 03:56:14,975 INFO org.apache.flink.runtime.security.modules.JaasModule [] - Jaas file will be created as /tmp/jaas-11053256423370857721.conf.-04-21 03:56:14,988 INFO org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.-04-21 03:56:14,993 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Initializing cluster services.-04-21 03:56:15,004 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Using working directory: WorkingDirectory(/tmp/jm_016b9c4949a2c1712d36faf922b30dc2).-04-21 03:56:15,437 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Trying to start actor system, external address flink-jobmanager-one:6123, bind address 0.0.0.0:6123.-04-21 03:56:16,875 INFO akka.event.slf4j.Slf4jLogger [] - Slf4jLogger started-04-21 03:56:16,924 INFO akka.remote.RemoteActorRefProvider [] - Akka Cluster not in use - enabling unsafe features anyway because `akka.remote.use-unsafe-remote-features-outside-cluster` has been enabled.-04-21 03:56:16,925 INFO akka.remote.Remoting [] - Starting remoting-04-21 03:56:17,192 INFO akka.remote.Remoting [] - Remoting started; listening on addresses :[akka.tcp://flink@flink-jobmanager-one:6123]-04-21 03:56:17,402 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Actor system started at akka.tcp://flink@flink-jobmanager-one:6123-04-21 03:56:17,815 INFO org.apache.flink.runtime.blob.FileSystemBlobStore [] - Creating highly available BLOB storage directory at file:/opt/flink/ha-storage/default/blob-04-21 03:56:19,090 INFO org.apache.flink.runtime.blob.BlobServer [] - Created BLOB server storage directory /tmp/jm_016b9c4949a2c1712d36faf922b30dc2/blobStorage-04-21 03:56:19,096 INFO org.apache.flink.runtime.blob.BlobServer [] - Started BLOB server at 0.0.0.0:6124 - max concurrent requests: 50 - max backlog: 1000-04-21 03:56:19,116 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl [] - No metrics reporter configured, no metrics will be exposed/reported.-04-21 03:56:19,122 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Trying to start actor system, external address flink-jobmanager-one:0, bind address 0.0.0.0:0.-04-21 03:56:19,155 INFO akka.event.slf4j.Slf4jLogger [] - Slf4jLogger started-04-21 03:56:19,162 INFO akka.remote.RemoteActorRefProvider [] - Akka Cluster not in use - enabling unsafe features anyway because `akka.remote.use-unsafe-remote-features-outside-cluster` has been enabled.-04-21 03:56:19,162 INFO akka.remote.Remoting [] - Starting remoting-04-21 03:56:19,177 INFO akka.remote.Remoting [] - Remoting started; listening on addresses :[akka.tcp://flink-metrics@flink-jobmanager-one:44181]-04-21 03:56:19,195 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Actor system started at akka.tcp://flink-metrics@flink-jobmanager-one:44181-04-21 03:56:19,218 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/rpc/MetricQueryService .

2.新建taskmanager服务

服务名称取做:flink-taskmanager-one

镜像版本使用:flink:1.15.0

taskmanager不需要配置端口

配置预设参数和环境变量

预设参数:taskmanager(表示当前服务为taskmanager,对应的是jobmanager)

环境变量:挂载的flink-conf.xml文件

flink-conf.xml详细内容:

parallelism.default: 1rest.address: 0.0.0.0rest.bind-address: 0.0.0.0blob.server.port: 6124query.server.port: 6125taskmanager.bind-host: 0.0.0.0taskmanager.memory.process.size: 1728mtaskmanager.numberOfTaskSlots: 100jobmanager.rpc.port: 6123jobmanager.bind-host: 0.0.0.0jobmanager.rpc.address: flink-jobmanager-onejobmanager.execution.failover-strategy: regionjobmanager.memory.process.size: 1600mkubernetes.cluster-id: defaulthigh-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactoryhigh-availability.storageDir: file:///opt/flink/nfs/ha-storageexecution.checkpointing.interval: 3minexecution.checkpointing.externalized-checkpoint-retention: RETAIN_ON_CANCELLATIONexecution.checkpointing.max-concurrent-checkpoints: 1execution.checkpointing.min-pause: 0execution.checkpointing.mode: EXACTLY_ONCEexecution.checkpointing.timeout: 10minexecution.checkpointing.tolerable-failed-checkpoints: 0execution.checkpointing.unaligned: truestate.backend: rocksdbstate.checkpoints.dir: file:///home/flink/checkpointsstate.savepoints.dir: file:///home/flink/flink/savepointsstate.backend.incremental: truerestart-strategy: failure-raterestart-strategy.failure-rate.max-failures-per-interval: 10restart-strategy.failure-rate.failure-rate-interval: 300srestart-strategy.failure-rate.delay: 15s

taskmanager需要挂载flink的lib目录之外,还要挂载savepoint和checkpoint(如果你只是想跑起来试试,那么不用管这个配置)

jobmanager就到此配置完毕了,点击保存即可启动服务

查看服务启动日志,为如下状态,且无其它报错,代表启动正常

-04-21 03:59:07,347 INFO org.apache.flink.core.fs.FileSystem [] - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.-04-21 03:59:07,475 INFO org.apache.flink.runtime.state.changelog.StateChangelogStorageLoader [] - StateChangelogStorageLoader initialized with shortcut names {memory,filesystem}.-04-21 03:59:07,485 INFO org.apache.flink.runtime.state.changelog.StateChangelogStorageLoader [] - StateChangelogStorageLoader initialized with shortcut names {memory,filesystem}.-04-21 03:59:07,507 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.-04-21 03:59:07,527 INFO org.apache.flink.runtime.security.modules.JaasModule [] - Jaas file will be created as /tmp/jaas-9454914976650115513.conf.-04-21 03:59:07,540 INFO org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.-04-21 03:59:08,538 INFO org.apache.flink.runtime.blob.FileSystemBlobStore [] - Creating highly available BLOB storage directory at file:/opt/flink/ha-storage/default/blob-04-21 03:59:09,872 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver{configMapName='default-cluster-config-map'}.-04-21 03:59:09,872 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapSharedInformer [] - Starting to watch for default/default-cluster-config-map, watching id:7c948823-ecdd-4ba4-b6be-568113b6a8ad-04-21 03:59:09,874 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils [] - Trying to select the network interface and address to use by connecting to the leading JobManager.-04-21 03:59:09,874 INFO org.apache.flink.runtime.util.LeaderRetrievalUtils [] - TaskManager will try to connect for PT10S before falling back to heuristics-04-21 03:59:15,278 INFO org.apache..ConnectionUtils [] - Trying to connect to address flink-jobmanager-one/****:6123-04-21 03:59:15,380 INFO org.apache..ConnectionUtils [] - Failed to connect to [flink-jobmanager-one/10.43.79.104:6123] from local address [localhost/127.0.0.1] with timeout [100] due to: connect timed out-04-21 03:59:15,381 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Stopping DefaultLeaderRetrievalService.-04-21 03:59:15,381 INFO org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver [] - Stopping KubernetesLeaderRetrievalDriver{configMapName='default-cluster-config-map'}.-04-21 03:59:15,382 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - TaskManager will use hostname/address 'flink-taskmanager-one-5c484b8685-rrsct' (10.42.2.35) for communication.-04-21 03:59:15,382 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapSharedInformer [] - Stopped to watch for default/default-cluster-config-map, watching id:7c948823-ecdd-4ba4-b6be-568113b6a8ad-04-21 03:59:15,482 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Trying to start actor system, external address ****:0, bind address 0.0.0.0:0.-04-21 03:59:16,610 INFO akka.event.slf4j.Slf4jLogger [] - Slf4jLogger started-04-21 03:59:16,661 INFO akka.remote.RemoteActorRefProvider [] - Akka Cluster not in use - enabling unsafe features anyway because `akka.remote.use-unsafe-remote-features-outside-cluster` has been enabled.-04-21 03:59:16,662 INFO akka.remote.Remoting [] - Starting remoting-04-21 03:59:16,927 INFO akka.remote.Remoting [] - Remoting started; listening on addresses :[akka.tcp://flink@10.42.2.35:33783]-04-21 03:59:17,142 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Actor system started at akka.tcp://flink@*****:33783-04-21 03:59:17,171 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - Using working directory: WorkingDirectory(/tmp/tm_10.42.2.35:33783-56875c)-04-21 03:59:17,187 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl [] - No metrics reporter configured, no metrics will be exposed/reported.-04-21 03:59:17,194 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Trying to start actor system, external address *****:0, bind address 0.0.0.0:0.-04-21 03:59:17,226 INFO akka.event.slf4j.Slf4jLogger [] - Slf4jLogger started-04-21 03:59:17,234 INFO akka.remote.RemoteActorRefProvider [] - Akka Cluster not in use - enabling unsafe features anyway because `akka.remote.use-unsafe-remote-features-outside-cluster` has been enabled.-04-21 03:59:17,234 INFO akka.remote.Remoting [] - Starting remoting-04-21 03:59:17,249 INFO akka.remote.Remoting [] - Remoting started; listening on addresses :[akka.tcp://flink-metrics@10.42.2.35:34252]-04-21 03:59:17,272 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils [] - Actor system started at akka.tcp://flink-metrics@10.42.2.35:34252-04-21 03:59:17,298 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/rpc/MetricQueryService_10.42.2.35:33783-56875c .-04-21 03:59:17,350 INFO org.apache.flink.runtime.blob.PermanentBlobCache [] - Created BLOB cache storage directory /tmp/tm_10.42.2.35:33783-56875c/blobStorage-04-21 03:59:17,360 INFO org.apache.flink.runtime.blob.TransientBlobCache [] - Created BLOB cache storage directory /tmp/tm_10.42.2.35:33783-56875c/blobStorage-04-21 03:59:17,367 INFO org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: []-04-21 03:59:17,368 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - Starting TaskManager with ResourceID: *****:33783-56875c-04-21 03:59:17,436 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices [] - Temporary file directory '/tmp': total 549 GB, usable 447 GB (81.42% usable)-04-21 03:59:17,442 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager [] - Created a new FileChannelManager for spilling of task related data to disk (joins, sorting, ...). Used directories:/tmp/flink-io-ebe2ab9f-f60f-4b6b-be7a-b090613e845e-04-21 03:59:17,453 INFO org.apache.flink.tyConfig [] - NettyConfig [server address: /0.0.0.0, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: AUTO, number of server threads: 100 (manual), number of client threads: 100 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]-04-21 03:59:17,593 INFO org.apache.flink.tyShuffleServiceFactory [] - Created a new FileChannelManager for storing result partitions of BLOCKING shuffles. Used directories:/tmp/flink-netty-shuffle-bb8f836c-1bde-446a-955d-d771f33a18f0-04-21 03:59:17,760 INFO org.apache.flink.workBufferPool [] - Allocated 128 MB for network buffer pool (number of memory segments: 4096, bytes per segment: 32768).-04-21 03:59:17,782 INFO org.apache.flink.tyShuffleEnvironment [] - Starting the network environment and its components.-04-21 03:59:17,923 INFO org.apache.flink.tyClient [] - Transport type 'auto': using EPOLL.-04-21 03:59:17,927 INFO org.apache.flink.tyClient [] - Successful initialization (took 144 ms).-04-21 03:59:17,955 INFO org.apache.flink.tyServer [] - Transport type 'auto': using EPOLL.-04-21 03:59:18,023 INFO org.apache.flink.tyServer [] - Successful initialization (took 93 ms). Listening on SocketAddress /0:0:0:0:0:0:0:0%0:44943.-04-21 03:59:18,025 INFO org.apache.flink.runtime.taskexecutor.KvStateService [] - Starting the kvState service and its components.-04-21 03:59:18,068 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/rpc/taskmanager_0 .-04-21 03:59:18,098 INFO org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapSharedInformer [] - Starting to watch for default/default-cluster-config-map, watching id:fa1d1e62-3096-49e5-98c6-33cefd409f64-04-21 03:59:18,098 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver{configMapName='default-cluster-config-map'}.-04-21 03:59:18,100 INFO org.apache.flink.runtime.taskexecutor.DefaultJobLeaderService [] - Start job leader service.-04-21 03:59:18,102 INFO org.apache.flink.runtime.filecache.FileCache [] - User file cache uses directory /tmp/flink-dist-cache-9504579b-10e1-40d9-8432-5f3e84862ae8-04-21 03:59:19,558 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Connecting to ResourceManager akka.tcp://flink@flink-jobmanager-one:6123/user/rpc/resourcemanager_0(ac2fae3954b4df272f119171641343f0).-04-21 03:59:19,808 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Resolved ResourceManager address, beginning registrationWARNING: An illegal reflective access operation has occurredWARNING: Illegal reflective access by ty.util.internal.ByteBufferUtil (file:/tmp/flink-rpc-akka_3755f8e5-ec9a-425f-9815-a45ebb25f358.jar) to method java.nio.DirectByteBuffer.cleaner()WARNING: Please consider reporting this to the maintainers of ty.util.internal.ByteBufferUtilWARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operationsWARNING: All illegal access operations will be denied in a future release-04-21 03:59:19,923 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Successful registration at resource manager akka.tcp://flink@flink-jobmanager-one:6123/user/rpc/resourcemanager_0 under registration id f679013e892a684acbbda4efc6a3a0b0.

3.验证集群是否搭建完成

进入配置的地址加对应暴露的30000段的端口,如下面这两图显示,在flinkUI中可分别看到jobmanager和taskmanager的参数配置即表示安装成功

jobmanager正常显示:

taskmanager正常显示,根据你配置的taskmanager数量显示:

至此,安装完毕。

如果觉得《rancher部署flink集群》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。