Class PseudonymizeAndSequester
A class to implement bulk de-identification and pseudonymization of DICOM files with sequesteration of files that may have risk of identity leakage.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected class
A protected class that actually does all the work of finding and processing the files. -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionPseudonymizeAndSequester
(String inputPathName, String outputFolderCleanName, String outputFolderDirtyName, String pseudonymizationControlFileName, String pseudonymizationResultByOriginalPatientIDFileName, String pseudonymizationResultByOriginalStudyInstanceUIDFileName, String failedFilesFileName, String uidMapResultFileName, String seed, boolean keepAllPrivate, boolean addContributingEquipmentSequence, boolean keepDescriptors, boolean keepSeriesDescriptors, boolean keepProtocolName, boolean keepPatientCharacteristics, boolean keepDeviceIdentity, boolean keepInstitutionIdentity, int handleDates, int handleStructuredContent) Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage. -
Method Summary
Modifier and TypeMethodDescriptionprotected static boolean
containsOverlay
(AttributeList list) protected String
createNewPseudonymousPatientAndAddToMaps
(String originalPatientID, String originalStudyInstanceUID) Create a new PatientID and PatientName and them to the maps.protected static boolean
isDirty
(AttributeList list) static void
Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.protected static String
makeOutputFileName
(String outputFolderName, String inputFileName, String sopInstanceUID) Make a suitable file name to use for a deidentified and redacted input file.protected void
readPseudonymizationControlFile
(String pseudonymizationControlFileName) Read a file mapping original PatientID or StudyInstanceUID to new PatientID and PatientName and add them to the maps.protected void
protected void
protected void
writeUIDMapResult
(PrintWriter uidMapResultWriter)
-
Field Details
-
ourCalledAETitle
-
radixForRandomPseudonymousID
protected static int radixForRandomPseudonymousID -
epochForDateModification
-
defaultEarliestDateInSet
-
newPatientIDByOriginalPatientID
-
newPatientIDByOriginalStudyInstanceUID
-
newPatientNameByNewPatientID
-
earliestDateByOrignalPatientID
-
random
-
-
Constructor Details
-
PseudonymizeAndSequester
public PseudonymizeAndSequester(String inputPathName, String outputFolderCleanName, String outputFolderDirtyName, String pseudonymizationControlFileName, String pseudonymizationResultByOriginalPatientIDFileName, String pseudonymizationResultByOriginalStudyInstanceUIDFileName, String failedFilesFileName, String uidMapResultFileName, String seed, boolean keepAllPrivate, boolean addContributingEquipmentSequence, boolean keepDescriptors, boolean keepSeriesDescriptors, boolean keepProtocolName, boolean keepPatientCharacteristics, boolean keepDeviceIdentity, boolean keepInstitutionIdentity, int handleDates, int handleStructuredContent) throws DicomException, FileNotFoundException, IOException Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.
Searches the specified input path recursively for suitable files.
The pseudonymizationControlFileName and pseudonymizationResultFileName files are three columns of tab delimited UTF-8 text, the original PatientID, the new PatientID and the new PatientName.
- Parameters:
inputPathName
- the path to search for DICOM filesoutputFolderCleanName
- where to store all the low risk processed output files (must already exist)outputFolderDirtyName
- where to store all the high risk processed output files (must already exist)pseudonymizationControlFileName
- values to use for pseudonymization, may be null or empty in which case random values are usedpseudonymizationResultByOriginalPatientIDFileName
- file into which to store pseudonymization by original PatientID performedpseudonymizationResultByOriginalStudyInstanceUIDFileName
- file into which to store pseudonymization by original StudyInstanceUID performedfailedFilesFileName
- file into which to store the paths of files that failed to processuidMapResultFileName
- file into which to store the map of original to new UIDsseed
- the initial seed to generate random pseudonymous identifiers, long integer as string or null or zero length if none (for deterministic creation of pseudonyms)keepAllPrivate
- retain all private attributes, not just known safe onesaddContributingEquipmentSequence
- whether or not to add ContributingEquipmentSequencekeepDescriptors
- if true, keep the text description and comment attributeskeepSeriesDescriptors
- if true, keep the series description even if all other descriptors are removedkeepProtocolName
- if true, keep protocol name even if all other descriptors are removedkeepPatientCharacteristics
- if true, keep patient characteristics (such as might be needed for PET SUV calculations)keepDeviceIdentity
- if true, keep device identitykeepInstitutionIdentity
- if true, keep institution identityhandleDates
- keep, remove or modify dates and timeshandleStructuredContent
- keep, remove or modify structured content- Throws:
DicomException
IOException
FileNotFoundException
-
-
Method Details
-
makeOutputFileName
protected static String makeOutputFileName(String outputFolderName, String inputFileName, String sopInstanceUID) throws IOException Make a suitable file name to use for a deidentified and redacted input file.
The default is the UID plus "_Anon.dcm" in the outputFolderName (ignoring the inputFileName).
Override this method in a subclass if a different file name is required.
- Parameters:
outputFolderName
- where to store all the processed output filesinputFileName
- the path to search for DICOM filessopInstanceUID
- the SOP Instance UID of the output file- Throws:
IOException
- if a filename cannot be constructed
-
readPseudonymizationControlFile
protected void readPseudonymizationControlFile(String pseudonymizationControlFileName) throws IOException Read a file mapping original PatientID or StudyInstanceUID to new PatientID and PatientName and add them to the maps.
Type of file is detected based on header line of the form: originalPatientID newPatientID newPatientName or originalStudyInstanceUID newPatientID newPatientName- Parameters:
pseudonymizationControlFileName
- the control file, if any- Throws:
IOException
-
createNewPseudonymousPatientAndAddToMaps
protected String createNewPseudonymousPatientAndAddToMaps(String originalPatientID, String originalStudyInstanceUID) Create a new PatientID and PatientName and them to the maps.
- Parameters:
originalPatientID
- the old PatientIDoriginalStudyInstanceUID
- the old StudyInstanceUID- Returns:
- the new PatientID
-
writePseudonymizationResultByOriginalPatientID
-
writePseudonymizationResultByOriginalStudyInstanceUID
-
writeUIDMapResult
-
containsOverlay
-
isDirty
-
main
Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.
Searches the specified input path recursively for suitable files The pseudonymizationControlFile and pseudonymizationResultFile are tab delimited with a header row containing either: originalPatientID newPatientID newPatientName or originalStudyInstanceUID newPatientID newPatientName- Parameters:
arg
- seven or eight parameters plus options, the inputPath (file or folder), outputFolderClean, outputFolderDirty, pseudonymizationControlFile, pseudonymizationResultByOriginalPatientIDFile, pseudonymizationResultByOriginalStudyInstanceUIDFile, failedFilesFile, uidMapResultFile, and optionally a random seed for deterministic creation of pseudonyms, then various options controlling de-identification
-