您的位置:首页 > 科技 > IT业 > kafka源码阅读-Broker如何处理consumer消费、follower副本数据同步请求

kafka源码阅读-Broker如何处理consumer消费、follower副本数据同步请求

2024/10/11 9:20:47 来源:https://blog.csdn.net/hmh13548571896/article/details/140587233  浏览:    关键词:kafka源码阅读-Broker如何处理consumer消费、follower副本数据同步请求

概述

Kafka源码包含多个模块,每个模块负责不同的功能。以下是一些核心模块及其功能的概述:

  1. 服务端源码 :实现Kafka Broker的核心功能,包括日志存储、控制器、协调器、元数据管理及状态机管理、延迟机制、消费者组管理、高并发网络架构模型实现等。

  2. Java客户端源码 :实现了Producer和Consumer与Broker的交互机制,以及通用组件支撑代码。

  3. Connect源码 :用来构建异构数据双向流式同步服务。

  4. Stream源码 :用来实现实时流处理相关功能。

  5. Raft源码 :实现了Raft一致性协议。

  6. Admin模块 :Kafka的管理员模块,操作和管理其topic,partition相关,包含创建,删除topic,或者拓展分区等。

  7. Api模块 :负责数据交互,客户端与服务端交互数据的编码与解码。

  8. Client模块 :包含Producer读取Kafka Broker元数据信息的类,如topic和分区,以及leader。

  9. Cluster模块 :包含Broker、Cluster、Partition、Replica等实体类。

  10. Common模块 :包含各种异常类以及错误验证。

  11. Consumer模块 :消费者处理模块,负责客户端消费者数据和逻辑处理。

  12. Controller模块 :负责中央控制器的选举,分区的Leader选举,Replica的分配或重新分配,分区和副本的扩容等。

  13. Coordinator模块 :负责管理部分consumer group和他们的offset。

  14. Javaapi模块 :提供Java语言的Producer和Consumer的API接口。

  15. Log模块 :负责Kafka文件存储,读写所有Topic消息数据。

  16. Message模块 :封装多条数据组成数据集或压缩数据集。

  17. Metrics模块 :负责内部状态监控。

  18. Network模块 :处理客户端连接,网络事件模块。

  19. Producer模块 :生产者细节实现,包括同步和异步消息发送。

  20. Security模块 :负责Kafka的安全验证和管理。

  21. Serializer模块 :序列化和反序列化消息内容。

  22. Server模块 :涉及Leader和Offset的checkpoint,动态配置,延时创建和删除Topic,Leader选举,Admin和Replica管理等。

  23. Tools模块 :包含多种工具,如导出consumer offset值,LogSegments信息,Topic的log位置信息,Zookeeper上的offset值等。

  24. Utils模块 :包含各种工具类,如Json,ZkUtils,线程池工具类,KafkaScheduler公共调度器类等。

这些模块共同构成了Kafka的整体架构,使其能够提供高吞吐量、高可用性的消息队列服务。

kafka源码分支为1.0.2

KafkaApis类的handleFetchRequest()方法作为api入口:

  /*** Handle a fetch request*///处理消费者消息消费请求、follwer副本同步请求def handleFetchRequest(request: RequestChannel.Request) {val fetchRequest = request.body[FetchRequest]val versionId = request.header.apiVersionval clientId = request.header.clientIdval unauthorizedTopicResponseData = mutable.ArrayBuffer[(TopicPartition, FetchResponse.PartitionData)]()val nonExistingTopicResponseData = mutable.ArrayBuffer[(TopicPartition, FetchResponse.PartitionData)]()val authorizedRequestInfo = mutable.ArrayBuffer[(TopicPartition, FetchRequest.PartitionData)]()//replicaId >= 0时表示是follwer副本同步请求if (fetchRequest.isFromFollower() && !authorize(request.session, ClusterAction, Resource.ClusterResource))    for (topicPartition <- fetchRequest.fetchData.asScala.keys)unauthorizedTopicResponseData += topicPartition -> new FetchResponse.PartitionData(Errors.CLUSTER_AUTHORIZATION_FAILED,FetchResponse.INVALID_HIGHWATERMARK, FetchResponse.INVALID_LAST_STABLE_OFFSET,FetchResponse.INVALID_LOG_START_OFFSET, null, MemoryRecords.EMPTY)elsefor ((topicPartition, partitionData) <- fetchRequest.fetchData.asScala) {if (!authorize(request.session, Read, new Resource(Topic, topicPartition.topic)))unauthorizedTopicResponseData += topicPartition -> new FetchResponse.PartitionData(Errors.TOPIC_AUTHORIZATION_FAILED,FetchResponse.INVALID_HIGHWATERMARK, FetchResponse.INVALID_LAST_STABLE_OFFSET,FetchResponse.INVALID_LOG_START_OFFSET, null, MemoryRecords.EMPTY)else if (!metadataCache.contains(topicPartition.topic))nonExistingTopicResponseData += topicPartition -> new FetchResponse.PartitionData(Errors.UNKNOWN_TOPIC_OR_PARTITION,FetchResponse.INVALID_HIGHWATERMARK, FetchResponse.INVALID_LAST_STABLE_OFFSET,FetchResponse.INVALID_LOG_START_OFFSET, null, MemoryRecords.EMPTY)elseauthorizedRequestInfo += (topicPartition -> partitionData)}def convertedPartitionData(tp: TopicPartition, data: FetchResponse.PartitionData) = {// Down-conversion of the fetched records is needed when the stored magic version is// greater than that supported by the client (as indicated by the fetch request version). If the// configured magic version for the topic is less than or equal to that supported by the version of the// fetch request, we skip the iteration through the records in order to check the magic version since we// know it must be supported. However, if the magic version is changed from a higher version back to a// lower version, this check will no longer be valid and we will fail to down-convert the messages// which were written in the new format prior to the version downgrade.replicaManager.getMagic(tp).flatMap { magic =>val downConvertMagic = {if (magic > RecordBatch.MAGIC_VALUE_V0 && versionId <= 1 && !data.records.hasCompatibleMagic(RecordBatch.MAGIC_VALUE_V0))Some(RecordBatch.MAGIC_VALUE_V0)else if (magic > RecordBatch.MAGIC_VALUE_V1 && versionId <= 3 && !data.records.hasCompatibleMagic(RecordBatch.MAGIC_VALUE_V1))Some(RecordBatch.MAGIC_VALUE_V1)elseNone}downConvertMagic.map { magic =>trace(s"Down converting records from partition $tp to message format version $magic for fetch request from $clientId")val converted = data.records.downConvert(magic, fetchRequest.fetchData.get(tp).fetchOffset, time)updateRecordsProcessingStats(request, tp, converted.recordsProcessingStats)new FetchResponse.PartitionData(data.error, data.highWatermark, FetchResponse.INVALID_LAST_STABLE_OFFSET,data.logStartOffset, data.abortedTransactions, converted.records)}}.getOrElse(data)}// the callback for process a fetch response, invoked before throttlingdef processResponseCallback(responsePartitionData: Seq[(TopicPartition, FetchPartitionData)]) {val partitionData = {responsePartitionData.map { case (tp, data) =>val abortedTransactions = data.abortedTransactions.map(_.asJava).orNullval lastStableOffset = data.lastStableOffset.getOrElse(FetchResponse.INVALID_LAST_STABLE_OFFSET)tp -> new FetchResponse.PartitionData(data.error, data.highWatermark, lastStableOffset,data.logStartOffset, abortedTransactions, data.records)}}val mergedPartitionData = partitionData ++ unauthorizedTopicResponseData ++ nonExistingTopicResponseDataval fetchedPartitionData = new util.LinkedHashMap[TopicPartition, FetchResponse.PartitionData]()mergedPartitionData.foreach { case (topicPartition, data) =>if (data.error != Errors.NONE)debug(s"Fetch request with correlation id ${request.header.correlationId} from client $clientId " +s"on partition $topicPartition failed due to ${data.error.exceptionName}")fetchedPartitionData.put(topicPartition, data)}// fetch response callback invoked after any throttlingdef fetchResponseCallback(bandwidthThrottleTimeMs: Int) {def createResponse(requestThrottleTimeMs: Int): FetchResponse = {val convertedData = new util.LinkedHashMap[TopicPartition, FetchResponse.PartitionData]fetchedPartitionData.asScala.foreach { case (tp, partitionData) =>convertedData.put(tp, convertedPartitionData(tp, partitionData))}val response = new FetchResponse(convertedData, bandwidthThrottleTimeMs + requestThrottleTimeMs)response.responseData.asScala.foreach { case (topicPartition, data) =>// record the bytes out metrics only when the response is being sentbrokerTopicStats.updateBytesOut(topicPartition.topic, fetchRequest.isFromFollower, data.records.sizeInBytes)}response}if (fetchRequest.isFromFollower)sendResponseExemptThrottle(request, createResponse(0))elsesendResponseMaybeThrottle(request, requestThrottleMs => createResponse(requestThrottleMs))}// When this callback is triggered, the remote API call has completed.// Record time before any byte-rate throttling.request.apiRemoteCompleteTimeNanos = time.nanosecondsif (fetchRequest.isFromFollower) {// We've already evaluated against the quota and are good to go. Just need to record it now.val responseSize = sizeOfThrottledPartitions(versionId, fetchRequest, mergedPartitionData, quotas.leader)quotas.leader.record(responseSize)fetchResponseCallback(bandwidthThrottleTimeMs = 0)} else {// Fetch size used to determine throttle time is calculated before any down conversions.// This may be slightly different from the actual response size. But since down conversions// result in data being loaded into memory, it is better to do this after throttling to avoid OOM.val response = new FetchResponse(fetchedPartitionData, 0)val responseStruct = response.toStruct(versionId)quotas.fetch.maybeRecordAndThrottle(request.session.sanitizedUser, clientId, responseStruct.sizeOf,fetchResponseCallback)}}if (authorizedRequestInfo.isEmpty)processResponseCallback(Seq.empty)else {// call the replica manager to fetch messages from the local replica//调用replica manager从本地日志副本中拉取消息replicaManager.fetchMessages(fetchRequest.maxWait.toLong, //最长等待时间fetchRequest.replicaId, //follower副本的brokeridfetchRequest.minBytes, //拉取请求设置的最小拉取字节fetchRequest.maxBytes, //拉取请求设置的最大拉取字节versionId <= 2,authorizedRequestInfo,replicationQuota(fetchRequest),processResponseCallback, //回调函数fetchRequest.isolationLevel) //隔离级别,包含 READ_UNCOMMITTED, READ_COMMITTED;}}

ReplicaManager.fetchMessages()中会继续调用readFromLocalLog()方法:

  /*** Fetch messages from the leader replica, and wait until enough data can be fetched and return;* the callback function will be triggered either when timeout or required fetch info is satisfied*///从leader副本拉取数据,等待直到拉取到足够的数据,在超时或拉取数据满足条件时会触发回调函数def fetchMessages(timeout: Long,replicaId: Int,fetchMinBytes: Int,fetchMaxBytes: Int,hardMaxBytesLimit: Boolean,fetchInfos: Seq[(TopicPartition, PartitionData)],quota: ReplicaQuota = UnboundedQuota,responseCallback: Seq[(TopicPartition, FetchPartitionData)] => Unit,isolationLevel: IsolationLevel) {//replicaId>=0时,表示是对应brokerId的follower副本的拉取请求。否则就是consumer的拉取请求val isFromFollower = Request.isValidBrokerId(replicaId)//目前只能从leader拉取消息,因此fetchOnlyFromLeader始终为trueval fetchOnlyFromLeader = replicaId != Request.DebuggingConsumerId//fetchOnlyCommitted=true表示拉取请求来自 consumer, 只能拉取 HW 以内的数据;如果请求是来自 Follower Replica 同步,则没有该限制(false)。val fetchOnlyCommitted = !isFromFollowerdef readFromLog(): Seq[(TopicPartition, LogReadResult)] = {//获取本地日志val result = readFromLocalLog(replicaId = replicaId,fetchOnlyFromLeader = fetchOnlyFromLeader,readOnlyCommitted = fetchOnlyCommitted,fetchMaxBytes = fetchMaxBytes,hardMaxBytesLimit = hardMaxBytesLimit,readPartitionInfo = fetchInfos,quota = quota,isolationLevel = isolationLevel)//如果请求来自follower副本,则需要更新follower相关的拉取状态if (isFromFollower) updateFollowerLogReadResults(replicaId, result)else result}val logReadResults = readFromLog()// check if this fetch request can be satisfied right awayval logReadResultValues = logReadResults.map { case (_, v) => v }val bytesReadable = logReadResultValues.map(_.info.records.sizeInBytes).sumval errorReadingData = logReadResultValues.foldLeft(false) ((errorIncurred, readResult) =>errorIncurred || (readResult.error != Errors.NONE))// respond immediately if 1) fetch request does not want to wait//                        2) fetch request does not require any data//                        3) has enough data to respond//                        4) some error happens while reading data//若满足一下条件之一将会直接返回结果:timeout<=0、拉取的消息为空、拉取到了足够的数据、读取数据期间发生异常if (timeout <= 0 || fetchInfos.isEmpty || bytesReadable >= fetchMinBytes || errorReadingData) {val fetchPartitionData = logReadResults.map { case (tp, result) =>tp -> FetchPartitionData(result.error, result.highWatermark, result.leaderLogStartOffset, result.info.records,result.lastStableOffset, result.info.abortedTransactions)}responseCallback(fetchPartitionData)} else {//未满足立即返回结果的情况,需延迟返回结果// construct the fetch results from the read resultsval fetchPartitionStatus = logReadResults.map { case (topicPartition, result) =>val fetchInfo = fetchInfos.collectFirst {case (tp, v) if tp == topicPartition => v}.getOrElse(sys.error(s"Partition $topicPartition not found in fetchInfos"))(topicPartition, FetchPartitionStatus(result.info.fetchOffsetMetadata, fetchInfo))}val fetchMetadata = FetchMetadata(fetchMinBytes, fetchMaxBytes, hardMaxBytesLimit, fetchOnlyFromLeader,fetchOnlyCommitted, isFromFollower, replicaId, fetchPartitionStatus)val delayedFetch = new DelayedFetch(timeout, fetchMetadata, this, quota, isolationLevel, responseCallback)// create a list of (topic, partition) pairs to use as keys for this delayed fetch operationval delayedFetchKeys = fetchPartitionStatus.map { case (tp, _) => new TopicPartitionOperationKey(tp) }// try to complete the request immediately, otherwise put it into the purgatory;// this is because while the delayed fetch operation is being created, new requests// may arrive and hence make this operation completable.delayedFetchPurgatory.tryCompleteElseWatch(delayedFetch, delayedFetchKeys)}}/*** Read from multiple topic partitions at the given offset up to maxSize bytes*///从多个partition给定的offset中获取消息def readFromLocalLog(replicaId: Int,fetchOnlyFromLeader: Boolean,readOnlyCommitted: Boolean,fetchMaxBytes: Int,hardMaxBytesLimit: Boolean,readPartitionInfo: Seq[(TopicPartition, PartitionData)],quota: ReplicaQuota,isolationLevel: IsolationLevel): Seq[(TopicPartition, LogReadResult)] = {def read(tp: TopicPartition, fetchInfo: PartitionData, limitBytes: Int, minOneMessage: Boolean): LogReadResult = {val offset = fetchInfo.fetchOffsetval partitionFetchSize = fetchInfo.maxBytesval followerLogStartOffset = fetchInfo.logStartOffsetbrokerTopicStats.topicStats(tp.topic).totalFetchRequestRate.mark()brokerTopicStats.allTopicsStats.totalFetchRequestRate.mark()try {trace(s"Fetching log segment for partition $tp, offset $offset, partition fetch size $partitionFetchSize, " +s"remaining response limit $limitBytes" +(if (minOneMessage) s", ignoring response/partition size limits" else ""))// decide whether to only fetch from leader//当前fetchOnlyFromLeader始终为trueval localReplica = if (fetchOnlyFromLeader) {//获取该分区对应的leader Replica对象getLeaderReplicaIfLocal(tp)} elsegetReplicaOrException(tp)//获取 hw 位置,副本同步不设置这个值val initialHighWatermark = localReplica.highWatermark.messageOffsetval lastStableOffset = if (isolationLevel == IsolationLevel.READ_COMMITTED)Some(localReplica.lastStableOffset.messageOffset)elseNone// decide whether to only fetch committed data (i.e. messages below high watermark)//readOnlyCommitted=true,表示是consumer只能消费到hw前的消息,为false表示是follower同步消息,没有这个限制val maxOffsetOpt = if (readOnlyCommitted)Some(lastStableOffset.getOrElse(initialHighWatermark))elseNone/* Read the LogOffsetMetadata prior to performing the read from the log.* We use the LogOffsetMetadata to determine if a particular replica is in-sync or not.* Using the log end offset after performing the read can lead to a race condition* where data gets appended to the log immediately after the replica has consumed from it* This can cause a replica to always be out of sync.*///Log end offsetval initialLogEndOffset = localReplica.logEndOffset.messageOffset//log Start Offsetval initialLogStartOffset = localReplica.logStartOffsetval fetchTimeMs = time.millisecondsval logReadInfo = localReplica.log match {case Some(log) =>val adjustedFetchSize = math.min(partitionFetchSize, limitBytes)// Try the read first, this tells us whether we need all of adjustedFetchSize for this partition//从指定的 offset 位置开始读取数据,如果是follower副本同步,则maxOffsetOpt=Noneval fetch = log.read(offset, adjustedFetchSize, maxOffsetOpt, minOneMessage, isolationLevel)// If the partition is being throttled, simply return an empty set.//如果partition被限速了,那么返回 空 集合if (shouldLeaderThrottle(quota, tp, replicaId))FetchDataInfo(fetch.fetchOffsetMetadata, MemoryRecords.EMPTY)// For FetchRequest version 3, we replace incomplete message sets with an empty one as consumers can make// progress in such cases and don't need to report a `RecordTooLargeException`else if (!hardMaxBytesLimit && fetch.firstEntryIncomplete)FetchDataInfo(fetch.fetchOffsetMetadata, MemoryRecords.EMPTY)else fetchcase None =>error(s"Leader for partition $tp does not have a local log")FetchDataInfo(LogOffsetMetadata.UnknownOffsetMetadata, MemoryRecords.EMPTY)}LogReadResult(info = logReadInfo,highWatermark = initialHighWatermark,leaderLogStartOffset = initialLogStartOffset,leaderLogEndOffset = initialLogEndOffset,followerLogStartOffset = followerLogStartOffset,fetchTimeMs = fetchTimeMs,readSize = partitionFetchSize,lastStableOffset = lastStableOffset,exception = None)} catch {// NOTE: Failed fetch requests metric is not incremented for known exceptions since it// is supposed to indicate un-expected failure of a broker in handling a fetch requestcase e@ (_: UnknownTopicOrPartitionException |_: NotLeaderForPartitionException |_: ReplicaNotAvailableException |_: KafkaStorageException |_: OffsetOutOfRangeException) =>LogReadResult(info = FetchDataInfo(LogOffsetMetadata.UnknownOffsetMetadata, MemoryRecords.EMPTY),highWatermark = -1L,leaderLogStartOffset = -1L,leaderLogEndOffset = -1L,followerLogStartOffset = -1L,fetchTimeMs = -1L,readSize = partitionFetchSize,lastStableOffset = None,exception = Some(e))case e: Throwable =>brokerTopicStats.topicStats(tp.topic).failedFetchRequestRate.mark()brokerTopicStats.allTopicsStats.failedFetchRequestRate.mark()error(s"Error processing fetch operation on partition $tp, offset $offset", e)LogReadResult(info = FetchDataInfo(LogOffsetMetadata.UnknownOffsetMetadata, MemoryRecords.EMPTY),highWatermark = -1L,leaderLogStartOffset = -1L,leaderLogEndOffset = -1L,followerLogStartOffset = -1L,fetchTimeMs = -1L,readSize = partitionFetchSize,lastStableOffset = None,exception = Some(e))}}var limitBytes = fetchMaxBytes//存储拉取的数据结果集val result = new mutable.ArrayBuffer[(TopicPartition, LogReadResult)]var minOneMessage = !hardMaxBytesLimit//遍历要拉取的partition,并调用read方法读取数据readPartitionInfo.foreach { case (tp, fetchInfo) =>val readResult = read(tp, fetchInfo, limitBytes, minOneMessage)val recordBatchSize = readResult.info.records.sizeInBytes// Once we read from a non-empty partition, we stop ignoring request and partition level size limitsif (recordBatchSize > 0)minOneMessage = falselimitBytes = math.max(0, limitBytes - recordBatchSize)result += (tp -> readResult)}result}

Log.read()方法:

  /*** Read messages from the log.** @param startOffset The offset to begin reading at* @param maxLength The maximum number of bytes to read* @param maxOffset The offset to read up to, exclusive. (i.e. this offset NOT included in the resulting message set)* @param minOneMessage If this is true, the first message will be returned even if it exceeds `maxLength` (if one exists)* @param isolationLevel The isolation level of the fetcher. The READ_UNCOMMITTED isolation level has the traditional*                       read semantics (e.g. consumers are limited to fetching up to the high watermark). In*                       READ_COMMITTED, consumers are limited to fetching up to the last stable offset. Additionally,*                       in READ_COMMITTED, the transaction index is consulted after fetching to collect the list*                       of aborted transactions in the fetch range which the consumer uses to filter the fetched*                       records before they are returned to the user. Note that fetches from followers always use*                       READ_UNCOMMITTED.** @throws OffsetOutOfRangeException If startOffset is beyond the log end offset or before the log start offset* @return The fetch data information including fetch starting offset metadata and messages read.*///从指定 startOffset 开始读取数据def read(startOffset: Long, maxLength: Int, maxOffset: Option[Long] = None, minOneMessage: Boolean = false,isolationLevel: IsolationLevel): FetchDataInfo = {maybeHandleIOException(s"Exception while reading from $topicPartition in dir ${dir.getParent}") {trace(s"Reading $maxLength bytes from offset $startOffset of length $size bytes")// Because we don't use lock for reading, the synchronization is a little bit tricky.// We create the local variables to avoid race conditions with updates to the log.val currentNextOffsetMetadata = nextOffsetMetadataval next = currentNextOffsetMetadata.messageOffsetif (startOffset == next) {val abortedTransactions =if (isolationLevel == IsolationLevel.READ_COMMITTED) Some(List.empty[AbortedTransaction])else Nonereturn FetchDataInfo(currentNextOffsetMetadata, MemoryRecords.EMPTY, firstEntryIncomplete = false,abortedTransactions = abortedTransactions)}// 从跳跃表中查找对应的日志分段(logSegment)var segmentEntry = segments.floorEntry(startOffset)// return error on attempt to read beyond the log end offset or read below log start offsetif (startOffset > next || segmentEntry == null || startOffset < logStartOffset)throw new OffsetOutOfRangeException("Request for offset %d but we only have log segments in the range %d to %d.".format(startOffset, logStartOffset, next))// Do the read on the segment with a base offset less than the target offset// but if that segment doesn't contain any messages with an offset greater than that// continue to read from successive segments until we get some messages or we reach the end of the logwhile (segmentEntry != null) {val segment = segmentEntry.getValue// If the fetch occurs on the active segment, there might be a race condition where two fetch requests occur after// the message is appended but before the nextOffsetMetadata is updated. In that case the second fetch may// cause OffsetOutOfRangeException. To solve that, we cap the reading up to exposed position instead of the log// end of the active segment.val maxPosition = {if (segmentEntry == segments.lastEntry) {val exposedPos = nextOffsetMetadata.relativePositionInSegment.toLong// Check the segment again in case a new segment has just rolled out.//刚好此时产生了新的 segment文件, 再次判断if (segmentEntry != segments.lastEntry)// New log segment has rolled out, we can read up to the file end.segment.sizeelseexposedPos} else {segment.size}}//从 LogSegment 中读取相应的数据val fetchInfo = segment.read(startOffset, maxOffset, maxLength, maxPosition, minOneMessage)if (fetchInfo == null) {//如果该日志分段没有读取到数据, 则读取更高的日志分段segmentEntry = segments.higherEntry(segmentEntry.getKey)} else {return isolationLevel match {case IsolationLevel.READ_UNCOMMITTED => fetchInfocase IsolationLevel.READ_COMMITTED => addAbortedTransactions(startOffset, segmentEntry, fetchInfo)}}}// okay we are beyond the end of the last segment with no data fetched although the start offset is in range,// this can happen when all messages with offset larger than start offsets have been deleted.// In this case, we will return the empty set with log end offset metadataFetchDataInfo(nextOffsetMetadata, MemoryRecords.EMPTY)}}

会继续调用LogSegment.read():

  /*** Read a message set from this segment beginning with the first offset >= startOffset. The message set will include* no more than maxSize bytes and will end before maxOffset if a maxOffset is specified.** @param startOffset A lower bound on the first offset to include in the message set we read* @param maxSize The maximum number of bytes to include in the message set we read* @param maxOffset An optional maximum offset for the message set we read* @param maxPosition The maximum position in the log segment that should be exposed for read* @param minOneMessage If this is true, the first message will be returned even if it exceeds `maxSize` (if one exists)** @return The fetched data and the offset metadata of the first message whose offset is >= startOffset,*         or null if the startOffset is larger than the largest offset in this log*/@threadsafedef read(startOffset: Long, maxOffset: Option[Long], maxSize: Int, maxPosition: Long = size,minOneMessage: Boolean = false): FetchDataInfo = {if (maxSize < 0)throw new IllegalArgumentException("Invalid max size for log read (%d)".format(maxSize))//log文件大小val logSize = log.sizeInBytes // this may change, need to save a consistent copy//将起始的 offset 转换为起始的实际物理位置val startOffsetAndSize = translateOffset(startOffset)// if the start position is already off the end of the log, return null//如果起始位置已经超出了日志的末尾,则返回nullif (startOffsetAndSize == null)return nullval startPosition = startOffsetAndSize.positionval offsetMetadata = new LogOffsetMetadata(startOffset, this.baseOffset, startPosition)val adjustedMaxSize =if (minOneMessage) math.max(maxSize, startOffsetAndSize.size)else maxSize// return a log segment but with zero size in the case belowif (adjustedMaxSize == 0)return FetchDataInfo(offsetMetadata, MemoryRecords.EMPTY)// calculate the length of the message set to read based on whether or not they gave us a maxOffset//计算要读取的消息长度val fetchSize: Int = maxOffset match {//maxOffset=None,表示follower副本同步时的计算方式case None =>// no max offset, just read until the max positionmin((maxPosition - startPosition).toInt, adjustedMaxSize)//consumer拉取消息的计算方式case Some(offset) =>// there is a max offset, translate it to a file position and use that to calculate the max read size;// when the leader of a partition changes, it's possible for the new leader's high watermark to be less than the// true high watermark in the previous leader for a short window. In this window, if a consumer fetches on an// offset between new leader's high watermark and the log end offset, we want to return an empty response.if (offset < startOffset)return FetchDataInfo(offsetMetadata, MemoryRecords.EMPTY, firstEntryIncomplete = false)val mapping = translateOffset(offset, startPosition)val endPosition =if (mapping == null)logSize // the max offset is off the end of the log, use the end of the fileelsemapping.positionmin(min(maxPosition, endPosition) - startPosition, adjustedMaxSize).toInt}//根据起始的物理位置和读取长度读取数据文件FetchDataInfo(offsetMetadata, log.read(startPosition, fetchSize),firstEntryIncomplete = adjustedMaxSize < startOffsetAndSize.size)}/*** Find the physical file position for the first message with offset >= the requested offset.** The startingFilePosition argument is an optimization that can be used if we already know a valid starting position* in the file higher than the greatest-lower-bound from the index.** @param offset The offset we want to translate* @param startingFilePosition A lower bound on the file position from which to begin the search. This is purely an optimization and* when omitted, the search will begin at the position in the offset index.* @return The position in the log storing the message with the least offset >= the requested offset and the size of the*        message or null if no message meets this criteria.*///查找 offset 索引文件:调用 offset 索引文件的 lookup() 查找方法,获取离 startOffset 最接近的物理位置;//调用数据文件的 searchFor() 方法,从指定的物理位置开始读取每条数据,知道找到对应 offset 的物理位置。@threadsafeprivate[log] def translateOffset(offset: Long, startingFilePosition: Int = 0): LogOffsetPosition = {val mapping = index.lookup(offset)log.searchForOffsetWithSize(offset, max(mapping.position, startingFilePosition))}

最后会调用FileRecords.read()方法,根据position和size参数读取数据:

 /*** Return a slice of records from this instance, which is a view into this set starting from the given position* and with the given size limit.** If the size is beyond the end of the file, the end will be based on the size of the file at the time of the read.** If this message set is already sliced, the position will be taken relative to that slicing.** @param position The start position to begin the read from* @param size The number of bytes after the start position to include* @return A sliced wrapper on this message set limited based on the given position and size*/public FileRecords read(int position, int size) throws IOException {if (position < 0)throw new IllegalArgumentException("Invalid position: " + position);if (size < 0)throw new IllegalArgumentException("Invalid size: " + size);final int end;// handle integer overflowif (this.start + position + size < 0)end = sizeInBytes();elseend = Math.min(this.start + position + size, sizeInBytes());return new FileRecords(file, channel, this.start + position, end, true);}

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com