CWB
|
See cqp/corpmanag.c for the file format that this utility decodes. More...
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
See cqp/corpmanag.c for the file format that this utility decodes.
Note, some of the code is repeated across CQP's load-file functions and here. In the long term, we'll aim to remove this duplication. TODO!
#define SUBCORPMAGIC 36193928 |
magic number of the subcorpus file format; defined in the CQP code, corpmanag.c ; TODO should probably be defined centrally (cl/globals.h?)
Referenced by nqrfile_print_info().
int file_length | ( | FILE * | fd | ) |
Gets the size of the file.
fd | File handle. |
int main | ( | int | argc, |
char ** | argv | ||
) |
Main function for cwb-decode-nqrfile.
argc | Number of command-line arguments. |
argv | Command-line arguments. |
References nqrfile_print_info().
int nqrfile_print_info | ( | FILE * | fd, |
int | print_header | ||
) |
Reads a subcorpus file and prints information about it to STDOUT.
"Subcorpus file" here means (a) it begins with the subcorpus magic number; (b) then there is a "registry" area terminated by one or more zero bytes; (c) then there may be the size of the subcorpus; (d) then there are a whole load of start-end range integer pairs, to the end of the file.
The registry is printed iff print_header. The start-end pairs are printed on tab-delimited lines, one line per pair.
fd | File pointer. |
print_header | Boolean: controls whether a "registry" header in the subcorpus file gets printed or not |
References file_length(), registry, and SUBCORPMAGIC.
Referenced by main().