Matlab按文件名顺序读取文件中的问题 2020-12-17_matlab中dir读文件的顺序_Lance Jay的博客

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

Matlab按文件名顺序读取文件名中排序的问题

环境：Matlab 2016a
系统：Windows 10

在Matlab中，常常需要批量读取一些文件，或者是读取一个文件夹下特定格式的所有文件，然后再进行下一步处理。对于一些独立的样本而言，文件读取的顺序是不会影响到数据接下来的分析处理的。而对于带有时间戳的时序数据而言，按时间顺序进行读取并进行排列是会影响到模型建立的好坏的。

Matlab中常用的读取所有文件的函数为 dir函数 ，
使用dir函数获得指定文件夹下的指定格式文件,并存放在在一种为文件结构体数组中.这个结构体数组包含5个字段，分别为 name，date，bytes，isdir，datenum。
如下图所示。

这个函数的使用方法如下：

trainPath='C:\Users\1210xlsx\';    % 你存放数据文件的文件夹
train  = dir([trainPath '*.xlsx']); % 遍历该文件夹下所有xlsx格式文件
train_num = length(train);
data_all=[]
for i:train_num 
  [data,text] = xlsread([trainPath train(i).name]);  
  %  取train结构体中第i个元素的name字段作为文件名，构成文件的真实地址
  data_all=[data_all;data];         %   纵向拼接
但是这个读取方法有一个问题，就是Matlab使用dir函数读取文件时，它并不会完全按照文件名排列的顺序，如下图所示，在文件夹中正常排序的xlsx文件，使用dir文件读取的顺序产生了变化。 
dir函数使用的排布顺序是通过对比文件名的字符串，通过两个文件名中第一个不相同的字符串，比较其编码顺序来排序的
 例如：
 对1，11，101，2进行排序，
 那么排序后会变成1，101，11，2
 具体排序过程： 
首先判断第一个字符：分别是 1，1，1，2，在大多数编码中，1字符是排在2字符前面的，因此2这个数显然排在这四个文件中最后一个。
对第一个字符为1的三个文件进行下一步判断，即判断第二个字符，分别是 空，0，1，因此得到排序顺序为1，101，11。
最终得到顺序为：1，101，11，2 
解决办法：sort_nat函数
 
由于在上述代码中，排序放式并不是我们所想要的按十进制数的大小这排序
 为了能够按十进制的排序方式，采用sort_nat（）函数，对files.name 进行排序。其中调用sort_nat（）方式和sort_nat（）函数内容如下： 
trainPath='C:\Users\1210xlsx\';    % 你存放数据文件的文件夹
train  = dir([trainPath '*.xlsx']); % 遍历该文件夹下所有xlsx格式文件
train_num = length(train);
data_all=[]
sort_nat_name=sort_nat({train.name});     %使用sort_nat进行排序
for i:train_num 
  [data,text] = xlsread([trainPath sort_nat_name{i}]); 
  %  取sort_nat_name元胞中第i个元素作为文件名，构成文件的真实地址
  data_all=[data_all;data];         %   纵向拼接
本质上就是对dir读取的文件名重新进行排序，按照文件名中相邻数字的十进制大小进行重新排序
 其中需要注意的是，因为sort_nat函数得到的是一个元胞，因此需要通过sort_nat_name{i}，使用大括号进行读取，而不能使用（ ）或者[ ] 
对前面所述例子进行排序，其中train就是dir读取顺序，sort_nat_name就是再对dir读取的顺序重新进行排序后的结果，如图所示。
  
以下是sort_nat( )这个函数的具体内容 
%sort_nat具体内容
function [cs,index] = sort_nat(c,mode)
%sort_nat: Natural order sort of cell array of strings.
% usage:  [S,INDEX] = sort_nat(C)
% where,
%    C is a cell array (vector) of strings to be sorted.
%    S is C, sorted in natural order.
%    INDEX is the sort order such that S = C(INDEX);
% Natural order sorting sorts strings containing digits in a way such that
% the numerical value of the digits is taken into account.  It is
% especially useful for sorting file names containing index numbers with
% different numbers of digits.  Often, people will use leading zeros to get
% the right sort order, but with this function you don't have to do that.
% For example, if C = {'file1.txt','file2.txt','file10.txt'}, a normal sort
% will give you
%       {'file1.txt'  'file10.txt'  'file2.txt'}
% whereas, sort_nat will give you
%       {'file1.txt'  'file2.txt'  'file10.txt'}
% See also: sort
% Version: 1.4, 22 January 2011
% Author:  Douglas M. Schwarz
% Email:   dmschwarz=ieee*org, dmschwarz=urgrad*rochester*edu
% Real_email = regexprep(Email,{'=','*'},{'@','.'})
% Set default value for mode if necessary.
if nargin < 2
    mode = 'ascend';
% Make sure mode is either 'ascend' or 'descend'.
modes = strcmpi(mode,{'ascend','descend'});
is_descend = modes(2);
if ~any(modes)
    error('sort_nat:sortDirection',...
        'sorting direction must be ''ascend'' or ''descend''.')
% Replace runs of digits with '0'.
c2 = regexprep(c,'\d+','0');
% Compute char version of c2 and locations of zeros.
s1 = char(c2);
z = s1 == '0';
% Extract the runs of digits and their start and end indices.
[digruns,first,last] = regexp(c,'\d+','match','start','end');
% Create matrix of numerical values of runs of digits and a matrix of the
% number of digits in each run.
num_str = length(c);
max_len = size(s1,2);
num_val = NaN(num_str,max_len);
num_dig = NaN(num_str,max_len);
for i = 1:num_str
    num_val(i,z(i,:)) = sscanf(sprintf('%s ',digruns{i}{:}),'%f');
    num_dig(i,z(i,:)) = last{i} - first{i} + 1;
% Find columns that have at least one non-NaN.  Make sure activecols is a
% 1-by-n vector even if n = 0.
activecols = reshape(find(~all(isnan(num_val))),1,[]);
n = length(activecols);
% Compute which columns in the composite matrix get the numbers.
numcols = activecols + (1:2:2*n);
% Compute which columns in the composite matrix get the number of digits.
ndigcols = numcols + 1;
% Compute which columns in the composite matrix get chars.
charcols = true(1,max_len + 2*n);
charcols(numcols) = false;
charcols(ndigcols) = false;
% Create and fill composite matrix, comp.
comp = zeros(num_str,max_len + 2*n);
comp(:,charcols) = double(s1);
comp(:,numcols) = num_val(:,activecols);
comp(:,ndigcols) = num_dig(:,activecols);
% Sort rows of composite matrix and use index to sort c in ascending or
% descending order, depending on mode.
[unused,index] = sortrows(comp);
if is_descend
    index = index(end:-1:1);
index = reshape(index,size(c));
cs = c(index);
感谢Mathwork上这些愿意分享开源函数的学者，因为有他们的努力才能够让Matlab变得更加方便快捷。 
参考文献： 
http://cn.mathworks.com/matlabcentral/fileexchange/34464-customizable-natural-order-sort
                    Matlab按文件名顺序读取文件名中排序的问题环境：Matlab 2016a系统：Windows 10在Matlab中，常常需要批量读取一些文件，或者是读取一个文件夹下特定格式的所有文件，然后再进行下一步处理。对于一些独立的样本而言，文件读取的顺序是不会影响到数据接下来的分析处理的。而对于带有时间戳的时序数据而言，按时间顺序进行读取并进行排列是会影响到模型建立的好坏的。Matlab中常用的读取所有文件的函数为 dir函数，使用dir函数获得指定文件夹下的指定格式文件,并存放在在一种为文件结构体数组
				使用“xlsread”或其他读取文件的函数，我们可以轻松地从文件夹中读取和处理数据文件。在实际工作和研究过程中，我们通常将数据存储在文件中，文件存在于系统的某个目录中。在这个例子中，我们使用 numel 函数获取文件列表中的文件数量，使用 for 循环对每个文件进行操作。使用“dir”函数可以查找特定文件夹中的所有文件和文件夹，并将结果保存到 MATLAB 结构体中。然而，如果我们要读取这些文件，请记住，在文件夹中的所有文件名称都是唯一的，因此在读取这些文件时，必须通过文件名或索引进行区分。
				function [cs,index] = sort_nat(c,mode)
%sort_nat: Natural order sort of cell array of strings.
% usage:  [S,INDEX] = sort_nat(C)
% where,
%    C is a cell array (vector) of strings to be sorted.
				python 按照自己顺序读出文件名
情景再现：在做一个批量读取文件夹txt文件内容后，将文件内容写入excel表格的简单脚本时使用到了os.listdir()函数。当脚本完成后检查excel表格内容时发现表格内容顺序和txt文件顺序不一样，这就导致我剩下的工作全部GG（超级难受）。然后就去着手解决python os.listdir()读出顺序乱序问题。
为什么os.listdir()读出会乱序？
我也不知道，那位大佬知道了麻烦评论教教我！！！！！
解决乱序（即按照自己的要求排序）
首先看一般情况下读出顺序
				function [cs,index] = sort_nat(c,mode)
%sort_nat: Natural order sort of cell array of strings.
% usage:  [S,INDEX] = sort_nat(C)
% where,
%    C is a cell array (vector) of strings to be sorted.
%    S is C, sorted in natural order.
%    INDEX is the so.
如题，最近用matlab处理数据，自动读取n个txt的时候，突然发现一个bug。
比如windows下命名是这样的。
wz1 wz2 wz3......wz10 wz11 .....wz100..
最初的时候我用了dir来读取文件。得到了直接进行运算。
但是发现读取的顺序是。
1 10 100 11 12 。。。。总之就是完全按照string排序
```matlab
folder = '文件夹路径';
files = dir(fullfile(folder, '*.txt')); % 获取文件夹中所有txt文件的信息
fileNames = cell(length(files), 1); % 创建一个空的cell数组，用于存储文件名
for i = 1:length(files)
    fileNames{i} = files(i).name; % 将文件名存入cell数组中
其中，`folder`为文件夹路径，`*.txt`为需要获取的文件类型，可以根据实际情况修改。`fileNames`为存储文件名的数组。