Dealing with large data and Matlab


A Matlab importer is available for loading the AX3 files (.cwa) directly into the workspace. For smaller files (~1 day) it is normal fine to read the whole file in one go. However, for larger multi-day files the amount of RAM needed to hold the complete data set can be an issue. This becomes even more an issue if multiple datasets want to be compared and thus held in the same workspace at the same time. In this instance we have found the best way is to chunk the data into manageable bite pieces, for example hours. The matlab script is able to take a startTime and stopTime arguments. It is also able to very quickly read out the number of validPackets in a file. The validPackets structure holds all the information to calculate the startTime and stopTime for each chunk.


A practical example might look something like:


filename = 'dataset.cwa';
fileinfo = AX3_readFile(filename, 'info', 1, 'useC', 1);
startTime = fileinfo.start.mtime;
stopTime = fileinfo.stop.mtime;
numHours = (stopTime - startTime)*24;
% pre-calculate for parallel toolbox compatibility
stopTime(ceil(numHours)) = stopTime; %save the actual last value here for later
stopTime(1) = startTime + (1/24); %1 hour in from the start

for i=2:ceil(numHours)-1
startTime(i) = stopTime(i-1) + 1/86400; %make sure start the second after last stop time
stopTime(i) = startTime(1) + (i/24);
end

for i=1:numHours
tmpData = AX3_readFile(filename, 'validPackets', fileinfo.validPackets, 'startTime', startTime(i), 'stopTime', stopTime(i));
fprintf('hour %d\r\n',i);
% Strut your stuff
clear tmpData
end


The above code was tested on an 8GB Ram machine and read in a 7 day data file in approximately 1:30sec.


The libraries containing the AX3_readFile code are HERE