Class PseudonymizeAndSequester

java.lang.Object
com.pixelmed.apps.PseudonymizeAndSequester

public class PseudonymizeAndSequester extends Object

A class to implement bulk de-identification and pseudonymization of DICOM files with sequesteration of files that may have risk of identity leakage.

  • Field Details

    • ourCalledAETitle

      protected static String ourCalledAETitle
    • radixForRandomPseudonymousID

      protected static int radixForRandomPseudonymousID
    • epochForDateModification

      protected static Date epochForDateModification
    • defaultEarliestDateInSet

      protected static Date defaultEarliestDateInSet
    • newPatientIDByOriginalPatientID

      protected Map<String,String> newPatientIDByOriginalPatientID
    • newPatientIDByOriginalStudyInstanceUID

      protected Map<String,String> newPatientIDByOriginalStudyInstanceUID
    • newPatientNameByNewPatientID

      protected Map<String,String> newPatientNameByNewPatientID
    • earliestDateByOrignalPatientID

      protected Map<String,Date> earliestDateByOrignalPatientID
    • random

      protected Random random
  • Constructor Details

    • PseudonymizeAndSequester

      public PseudonymizeAndSequester(String inputPathName, String outputFolderCleanName, String outputFolderDirtyName, String pseudonymizationControlFileName, String pseudonymizationResultByOriginalPatientIDFileName, String pseudonymizationResultByOriginalStudyInstanceUIDFileName, String failedFilesFileName, String uidMapResultFileName, String seed, boolean keepAllPrivate, boolean addContributingEquipmentSequence, boolean keepDescriptors, boolean keepSeriesDescriptors, boolean keepProtocolName, boolean keepPatientCharacteristics, boolean keepDeviceIdentity, boolean keepInstitutionIdentity, int handleDates, int handleStructuredContent) throws DicomException, FileNotFoundException, IOException

      Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.

      Searches the specified input path recursively for suitable files.

      The pseudonymizationControlFileName and pseudonymizationResultFileName files are three columns of tab delimited UTF-8 text, the original PatientID, the new PatientID and the new PatientName.

      Parameters:
      inputPathName - the path to search for DICOM files
      outputFolderCleanName - where to store all the low risk processed output files (must already exist)
      outputFolderDirtyName - where to store all the high risk processed output files (must already exist)
      pseudonymizationControlFileName - values to use for pseudonymization, may be null or empty in which case random values are used
      pseudonymizationResultByOriginalPatientIDFileName - file into which to store pseudonymization by original PatientID performed
      pseudonymizationResultByOriginalStudyInstanceUIDFileName - file into which to store pseudonymization by original StudyInstanceUID performed
      failedFilesFileName - file into which to store the paths of files that failed to process
      uidMapResultFileName - file into which to store the map of original to new UIDs
      seed - the initial seed to generate random pseudonymous identifiers, long integer as string or null or zero length if none (for deterministic creation of pseudonyms)
      keepAllPrivate - retain all private attributes, not just known safe ones
      addContributingEquipmentSequence - whether or not to add ContributingEquipmentSequence
      keepDescriptors - if true, keep the text description and comment attributes
      keepSeriesDescriptors - if true, keep the series description even if all other descriptors are removed
      keepProtocolName - if true, keep protocol name even if all other descriptors are removed
      keepPatientCharacteristics - if true, keep patient characteristics (such as might be needed for PET SUV calculations)
      keepDeviceIdentity - if true, keep device identity
      keepInstitutionIdentity - if true, keep institution identity
      handleDates - keep, remove or modify dates and times
      handleStructuredContent - keep, remove or modify structured content
      Throws:
      DicomException
      IOException
      FileNotFoundException
  • Method Details

    • makeOutputFileName

      protected static String makeOutputFileName(String outputFolderName, String inputFileName, String sopInstanceUID) throws IOException

      Make a suitable file name to use for a deidentified and redacted input file.

      The default is the UID plus "_Anon.dcm" in the outputFolderName (ignoring the inputFileName).

      Override this method in a subclass if a different file name is required.

      Parameters:
      outputFolderName - where to store all the processed output files
      inputFileName - the path to search for DICOM files
      sopInstanceUID - the SOP Instance UID of the output file
      Throws:
      IOException - if a filename cannot be constructed
    • readPseudonymizationControlFile

      protected void readPseudonymizationControlFile(String pseudonymizationControlFileName) throws IOException

      Read a file mapping original PatientID or StudyInstanceUID to new PatientID and PatientName and add them to the maps.

      Type of file is detected based on header line of the form: originalPatientID newPatientID newPatientName or originalStudyInstanceUID newPatientID newPatientName
      Parameters:
      pseudonymizationControlFileName - the control file, if any
      Throws:
      IOException
    • createNewPseudonymousPatientAndAddToMaps

      protected String createNewPseudonymousPatientAndAddToMaps(String originalPatientID, String originalStudyInstanceUID)

      Create a new PatientID and PatientName and them to the maps.

      Parameters:
      originalPatientID - the old PatientID
      originalStudyInstanceUID - the old StudyInstanceUID
      Returns:
      the new PatientID
    • writePseudonymizationResultByOriginalPatientID

      protected void writePseudonymizationResultByOriginalPatientID(PrintWriter w)
    • writePseudonymizationResultByOriginalStudyInstanceUID

      protected void writePseudonymizationResultByOriginalStudyInstanceUID(PrintWriter w)
    • writeUIDMapResult

      protected void writeUIDMapResult(PrintWriter uidMapResultWriter)
    • containsOverlay

      protected static boolean containsOverlay(AttributeList list)
    • isDirty

      protected static boolean isDirty(AttributeList list)
    • main

      public static void main(String[] arg)

      Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.

      Searches the specified input path recursively for suitable files The pseudonymizationControlFile and pseudonymizationResultFile are tab delimited with a header row containing either: originalPatientID newPatientID newPatientName or originalStudyInstanceUID newPatientID newPatientName
      Parameters:
      arg - seven or eight parameters plus options, the inputPath (file or folder), outputFolderClean, outputFolderDirty, pseudonymizationControlFile, pseudonymizationResultByOriginalPatientIDFile, pseudonymizationResultByOriginalStudyInstanceUIDFile, failedFilesFile, uidMapResultFile, and optionally a random seed for deterministic creation of pseudonyms, then various options controlling de-identification