添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
private void checkBlockSizeReached() throws IOException { if (recordCount >= recordCountForNextMemCheck) { // checking the memory size is relatively expensive, so let's not do it for every record. long memSize = columnStore.getBufferedSize(); long recordSize = memSize / recordCount; // flush the row group if it is within ~2 records of the limit // it is much better to be slightly under size than to be over at all if (memSize > (nextRowGroupSize - 2 * recordSize)) {         LOG.info( "mem size {} > {}: flushing {} records to disk." , memSize, nextRowGroupSize, recordCount);         flushRowGroupToStore();         initStore();         recordCountForNextMemCheck = min(max(MINIMUM_RECORD_COUNT_FOR_CHECK, recordCount / 2), MAXIMUM_RECORD_COUNT_FOR_CHECK); this .lastRowGroupEndPos = parquetFileWriter.getPos();       } else {         recordCountForNextMemCheck = min(             max(MINIMUM_RECORD_COUNT_FOR_CHECK, (recordCount + ( long )(nextRowGroupSize / (( float )recordSize))) / 2), // will check halfway recordCount + MAXIMUM_RECORD_COUNT_FOR_CHECK // will not look more than max records ahead         LOG.debug( "Checked mem at {} will check again at: {}" , recordCount, recordCountForNextMemCheck);

in this code,if the block size is small ,for example 8M,and the first 100 lines record size is small and  after 100 lines the record size is big,it will cause big row group,in our real scene,it will more than 64M. So i think the size for block check can configurable.

Attachments

Activity

People