Okay, after telling you about the reason why you should start to learn computer programming, in this blog, I will tell you the code that I used to do my task in my office. If you have not read the reason why you should start to learn computer programming, you can go to the post here. Okay, let’s start our technical part.
Looking back at the document we have to process, what is the (human) logic here? I will repost the screenshot of the NIST CSF document.
So, for me, the logic is really simple.
- We will loop for every subcategory because that is our main concern
- For every subcategory, we will look at informative references and look for a cell that contains “NIST SP 800–53 Rev. 4”.
- For several first data, it seems consistent hence, we can skip the “NIST SP 800–53 Rev. 4” and we just take the data after that.
- We will the selected string by space (‘ ’).
- We will loop for every separated item, remove the trailing and leading spaces (‘ ’) so just it will be cleaner.
- We will open the reference document which is the “NIST SP 800–53 Rev. 5” (Use rev 5 instead of 4 because I do not find the rev 4 in the excel file)
- We will look at the detail of the control by looking for the items in step 5 inside the “Control Identifier” column in the “NIST SP 800–53 Rev. 5” document.
- Because this is an official document, I suppose that every data we need is available so there will not be any “no reference” error.
- Combine them all together, and export them into an excel file.
Those steps above were immediately popped at first thought. After more deep analysis, I found additional information that needs to be considered.
- The rows under the “subcategory” column are merged for every subcategory, It is better to normalize them, hence I unmerged the cell and fill the values to the unmerged cell.
- There are possibly some additional points for certain controls like CM-8 has 9 more additional points. Because of this, we need additional code to check whether the control has additional points.
That concludes the plan for now. We will build the code. Before, we need to tidy up our excel. What I mean by tidy up is no merged cell and fill all the empty cell because of the unmerging process. You can learn it on here.
By then, we are ready!
Snippet 1 above is doing:
- Line 2. Import the necessary library, in this case, we use the “pandas” library.
- Line 4–6. We load the excel into a data frame and get only data that starts with “NIST” in their “Informative References” column.
- Line 8. We initiate an empty array to be a kind of placeholder for our data.
- Line 9–26. We do a loop. For every subcategory data, we map to NIST SP 800–53 rev 5 controls.
- Line 12–14. We get the value of the function, category, subcategory. We will use them when we build the excel.
- Line 16. We get the value of the cell from the “Informative References” column. The value must something like “NIST SP 800–53 Rev. 4 CP-2, IR-4, IR-8”.
- Line 17. We get the list of controls we need by skipping the first 22 characters (to remove the “NIST SP 800–53 Rev. 4 ”) part and take the rest from it (that is the meaning by [22:]
- Line 18. We split the string by comma “,” so we can go through each control.
- Line 20–26. We do another loop. For every control data, we map to each control detail.
- Line 22. We call the “get_nist800r5_ctrl_detail” function that will be examined quickly. The point is, this function gives us the mapping of a subcategory to the NIST SP 800–53 rev 5 controls.
- Line 23–25, We append 3 columns (Function, Category, Subcategory) and give it the value we have gotten before (Line 12–14).
- Line 26, We append into the placeholder we have prepared before (Line 8)
- Line 28. We convert the container into a data frame (So it can be exported to excel)
- Line 29. We change the column's name so it is more informative.
- Line 30. We change the column order.
- Line 32. We export the final data frame to excel.
Note: If you wonder how did I come with all the syntax? I googled it every line. Of course, I did not copy the whole thing, I wrote it, but the syntax itself, I did look on google.
The second part we will examine is the “get_nist800r5_ctrl_detail” function.
This looks like a long one but bare with me, it is not that hard. If you look carefully, there are a lot of codes that repeats. So basically, the idea is simple.
- Line 4. We can ignore this (literally). This value will be used to add some information that every data generated from this function is from “NIST 800–53 rev 5” document.
- Line 5. We load the value of the “NIST SP 800–53 rev 5” document and put it into a data frame.
- Line 7. We prepare the placeholder for our data.
- Line 9. We take the data that contains the exact value of nist800r5_ctrlID that we pass. For example, if we pass a value like “CM-8”, this code will take the only row that the “Control Identifier”’s value is “CM-8”.
- Line 10–14. We set a default value for several variables (“control_identifier” = the “nist800r5_ctrlID” value, “control_name”, ”control”, ”discussion”, and ”related_controls” to “-”.
- Line 16. We check, whether the row exists, if it does not, we continue, if it does it continues to next number.
- Line 18–21. If the row exists, we take the value of control_name, control, discussion, and related_controls from the row we get.
- Line 23–24. Whether the row exists or not, we transform the control_identifier, control_name, control, discussion, and related_controls variable into a data frame. After that we will add the value to our placeholder.
- Line 26–41. We do the exact same thing like before but this time, we will get the additional controls. The additional controls have pattern like “control_identifier(“+number+”)”. For example, CM-8(1), CM-8(2), AC-3(8). That is why I try to look for rows that starts with “control_identifier(”, for example “CM-8(”.
- Line 42. We “return” the value to whoever call this function, in our case, our main function above.
And this is the final excel generated by the program. All has been done in a matter of “seconds :)”
So I found a little inefficiency in our code (I am so sorry) and I tried to fix it. I fixed it by moving the “NIST SP 800–53 rev 5” file loading process to the top (not inside the “get_nist800r5_ctrl_detail” function, but rather on the main function) because it is so inefficient if we have to reload the file every time we call the function. It makes the program much slower.
I know it is a hard topic and maybe most of you will not read this message. But just in case you do, congratulation, and I hope my post gives a slight of knowledge about how coding works and how fun it is. I hope you are encouraged to learn computer programming on your own behalf. To GOD be all the glory, Soli Deo Gloria!