Prepare the IDM (Enhanced) Fingerprint File
Fingerprint files are created when you train the data source file using the DLP Integrator, which includes the IDMTrain tool. As a prerequisite for index document matching, the data source file must be trained using the IDMTrain tool generate the .db file.
All values in the data source file are normalized and hashed in the fingerprint file, regardless of the definition you use in classifications.
Install the DLP Integrator
To use IDM, you need to install DLP Integrator v.6.4.0, which includes the IDMTrain tool, is supported on both Windows and Linux platforms.
For more information, see:
Generate the database (.db) files using IDMTrain Tool
You can use the command line interface (CLI) or any third-party data transfer tool, such as PuTTy to run the idmtrain command with these options to train the data source file.
CLI Option - Short form | CLI Option - Full form | Description |
---|---|---|
-? |
--help |
Shows the IDMTrain tool help and exit |
-v |
[ --verbose ] |
Shows the verbose output |
-V |
[ --version ] |
Display version information and exit |
-q |
[ --quantity ] |
Extra initial dummy train for quantity of files |
-A |
[ --all-files ] |
Process all files (including hidden files) |
-E |
[ --no-errors ] |
Specifies not to generate error messages or enforce thresholds |
-W |
[ -no-warnings ] |
Specifies not to generate warning messages |
-j |
[ --json ] file |
Output progress and exit status to file as JSON |
-r |
[ --report ] file |
Output training information to file as JSON |
-o |
[ --output ] file |
Resultant database to create |
-e |
[ --errors ] % (=5) |
Specifies the error threshold in percentage |
-D |
[ --db-name ] name |
Specifies the database name (default based on output file name) |
-x |
[ --exclude ] pat ... |
Exclude files with case insensitive MS-DOS pattern |
-p |
[ --progress ] [=secs(=2)] |
Shows the progress after the specified interval |
Position dependent options
CLI Option - Short form | CLI Option - Full form | |
---|---|---|
-I |
[ --ignore ] |
Train following paths to ignore rather than classify |
-C |
[ --class ] guid ... |
Train following paths with these classifications |
-P |
[ --path ] path ... |
Train files directly under these paths |
-R |
[ --rpath ] path ... |
Train files recursively under these paths |
In a standard run the following points should be noted.
- Hidden files will not be trained. Hidden files on Windows have the hidden attribute set or start with ".", on Linux files starting with "." are hidden. Also on all OSs __MACOSX folders will not be trained. The command line -A or --all-files option will train hidden files.
- Directory symbolic links will not be followed.
Note: If -r is not specified then warnings/errors will go to stderr and there will be a completed message. If -p is specified then there will be progress output that will go to the JSON file if -j used. The -q option is required for percentage progress