Engineering Notebook: Entry 11

The end of my senior project journey has come, and as I prepare to create my final product I am reflecting about how much I learned in the past twelve weeks.

I began this project without any coding knowledge, and I am still learning to this day. Everyday, I become better at understand the logic behind coding. At first, I struggled with learning the syntax of Python and understanding the coding vocabulary. After becoming more comfortable with the language itself, I had to learn how to accommodate different libraries and how to work with type specific functions. I also had to learn how to quickly go through the thought process of achieving a task with code. At first, this process was slow for me, but as I solved more problems and did more things with my code, I learned how to go through coding logic more quickly. I learned a lot about coding and a lot about how I learn well while working on this project.

I also learned that it is always okay to ask questions. When I began working in this internship, I felt intimidated when I did not know how to complete a task I was asked to do. However, as I continued working I was encouraged to ask questions and was grateful that both my external advisors were always open to helping me.

My code is finally running without errors, and I am so close to having findings. My Splatalogue data is being collected and my code is being adjusted to my own LMC data. Looking back at my code, Dr. Colgan and I have written nearly 200 lines and watching my code run without errors is very rewarding after having to keep debugging it for so long.

At the end of my project, I hope to have some peaks that my code has recognized and a collection of code that can be used with other astronomical data. I hope my final product will be useful to my external advisors in the future when they look at other star forming regions.

I would like to thank Dr. Colgan and Dr. Rangwala for teaching and helping me accomplish my personal goals to learn Python and write some successfully running code. I have learned so much from my experience at this internship that I will take with me and hopefully use in my future endeavors.

I would also like to thank Mr. Adams, my school advisor, for helping me keep my ideas fresh and organized for all of my blog posts, proposals, and presentations.

Lastly, I would like to thank Ms. Belcher for organizing everyone’s senior project and keeping all of her students on their toes.

I will be presenting about my project journey on May 23rd at the Double Tree Hotel in San Jose, and this will be the end of my senior project journey. To those who will pursue a senior project in future years, I recommend exploring a topic that you have not explored before. It may inspire you in more ways than you may have expected.

As always, thanks for checking in!


Sponsored Post Learn from the experts: Create a successful blog with our brand new courseThe Blog

Are you new to blogging, and do you want step-by-step guidance on how to publish and grow your blog? Learn more about our new Blogging for Beginners course and get 50% off through December 10th. is excited to announce our newest offering: a course just for beginning bloggers where you’ll learn everything you need to know about blogging from the most trusted experts in the industry. We have helped millions of blogs get up and running, we know what works, and we want you to to know everything we know. This course provides all the fundamental skills and inspiration you need to get your blog started, an interactive community forum, and content updated annually.

Engineering Notebook: Entry 10

As we near the end of my senior project journey, my external advisors and I have been working hard to make sure that my code will yield findings.

This week I started working on the last part of my code which was to compare already identified lines to the lines from the data.

To do this, I took data from Splatalogue: and online resource library. Screen Shot 2018-04-27 at 11.44.19 PM.png

From Splatalogue, I entered the specifications for the data I wanted and then downloaded this data onto my laptop. From there, I converted the file into a text file so that I could call to it easier in my code.

Screen Shot 2018-04-27 at 11.46.26 PM

As known from the last time I called in a file, I knew I had to store the values in the data in separate numpy arrays. I worked on storing the frequency and intensity values into arrays.

Screen Shot 2018-04-27 at 11.48.22 PM.png

A concept I have struggled with numerous times throughout my coding experience appeared again. I had to convert the type of my array, and this time I had to convert the values to doubles.

Screen Shot 2018-04-27 at 11.49.57 PM.png

Dr. Colgan had recognized that all of the intensity values were log values. He said that the database probably stored the values as log values to save memory space. To be able to work with the accurate intensity values, I had to convert them from their log forms. This is the code I used to do that. Screen Shot 2018-04-27 at 11.52.54 PM.png

The next step that I started working on was saving the LMC data and putting it all into one file. The LMC is the Large Magellanic Cloud which is the cloud that I will be working with as my own data set. The data was given to me in different files, so I had to open each one of them using CASA, the Common Astronomy Software Applications package, to find the brightest point in each file and then save the data from the spectrums.

Here is what CASA would show me for the brighter regions that I would want to be looking at:

Screen Shot 2018-04-27 at 11.57.17 PM.png

I would take the region and the point of this bright “spot” and look at its molecular line survey.

Screen Shot 2018-04-27 at 11.58.40 PM.png

I would save these files as text files and then string them all together.

The next steps in my project involve writing code to compare frequencies and widths of the peaks found in the Splatalogue data and the LMC data. I look forward to tackling this challenge next week!

Thanks for checking in!


Engineering Notebook: Entry 8

It’s been a really productive last two weeks! I continued with noise estimation.

I began with the second part of the noise estimation for loop. As a brief summary, the first for loop had eliminated an unneeded part of the data and eliminated peaks that were under the sigma intensity value. Screen Shot 2018-04-06 at 11.35.47 PM

The second for loop checks to see if there are any peaks close enough to other peaks that they become insignificant. We want to eliminate nearby peaks so that we can more clearly analyze the tallest and most significant peak. Screen Shot 2018-04-26 at 9.46.11 AM.png

After these two loops, I have a number of print statements to print out values such as the value of intensity and the frequency of each peak that is recognized as significant that comes out of the loop.

After debugging these for loops, I saw that within my set of 4000 points there were 9 significant peaks. We wanted to be able to easily recognize these peaks by adding a marker to the tops of them on the plot. This was the code that creates the markers. I struggled with this a little bit because the negative peaks were not being marked. This was because I forgot to take the absolute value of intensity. Without taking the absolute value, only the positive peaks are evaluated with sigma. Screen Shot 2018-04-26 at 9.52.11 AM.png

Next, I wrote a while loop. This while loop was supposed to find the ‘ends’ of the peaks. The ‘ends’ would be the frequency value at which the peak begins and where it ends. I called the value at which it begins the lower line and the value at which it ends the upper line. These values would be helpful later when solving for equations. Screen Shot 2018-04-26 at 9.55.11 AM.png

There were six equations I needed to solve for.

Area: The area under the curve of the peak.Screen Shot 2018-04-26 at 11.02.56 AM.png

Frequency: The frequency at which the peak exists. For this peak, the frequency is .00015.

Screen Shot 2018-04-26 at 11.08.29 AM.png

Width: The width of the peak.

Screen Shot 2018-04-26 at 11.18.26 AM

Along with area, frequency, and width, I needed to solve for their error bounds. This would help us understand how accurate our original values are. Dr. Colgan, my external advisor, had solved for these equations previously for the previous intern, so I took the equations from her code and put them into mine.

Screen Shot 2018-04-26 at 11.23.20 AM.png

One of the hardest parts about debugging these equations was working with the parentheses. We had to go through and make sure that all the parentheses were in pairs. The values we were receiving still seemed off, so Dr. Colgan looked through the equations and realized that the **1/2 command was not working. He then told me to use np.sqrt instead because it works and is a more efficient way to take the square root.

These equations took a very long time to debug, but now that I am finished with the, I can finally start working on comparing the lines from the data to data found in labs.

Thanks for checking in!


Engineering Notebook: Entry 7

Every Friday when I pull into work, I see NASA employees of all ages playing a soccer game on the field next to my building. They all wear cleats, soccer shorts, and jerseys, and I think it is great that the employees here have become friends not only by being co-workers, but also by bonding over similar interests. Although it was raining today, it was great to see the NASA soccer community was still out there in their official soccer gear enjoying their Friday afternoons.

This week, I finished step 2: Noise Estimation and moved on the step 3: Evaluating the Ranges.

The first thing I did was create a function that would get the intensity value for a specific inputted frequency value. Dr. Colgan had helped me a lot with writing this piece of code. What we had discovered was that the np.where command turns the numpy arrays into tuples. This is why there is an int() command to turn the tuples back into integers. This way they become much easier to work with because they work with all kinds of commands, and they are mutable which means I can change their values later on in the code. Tuples are immutable and cannot be changed later on.

Screen Shot 2018-04-06 at 11.26.30 PMThe previous intern had already identified areas with great noise, so I used her ranges for the noise estimation. Picking up where I left off last week, I was solving for different sigma values from the different ranges. I turned my sequence of code into a function.

This made it easy for me to get the sigma value for any range at any point in my code because all I have to do is call the function.

Screen Shot 2018-04-06 at 11.49.42 PM.png

A challenge I faced this week were trying to call a variable that I defined in a function. I guess it did not occur to me that I could not call a variable that was defined in a function. Eventually, I realized I can set the call to a function, get_sigma(range_1) equal to a variable and still be able to call to that variable later.

The next thing I did was have my code read in 4000 data points. The first 4000 points was a good number to look at a large part of data, so that we can get multiple ranges to do noise estimation with. Here is the graph with those data points:

Screen Shot 2018-04-02 at 6.07.21 PM

The next part of the process I started working on was peak identification. Using the calculated sigma values, I was supposed to make two for loops to try to eliminate lines with intensity values higher than that sigma value taken from the noise estimation process.

I was not able to finish the second for loop this week, but here is the code I have for the first for loop.

Screen Shot 2018-04-06 at 11.35.47 PM.png

The first if statement ignores all lines between the frequencies of 11.6 and 12.71 and the second if statement checks to see if the intensity values of the peaks are higher than the sigma value. This will be the code that helps us isolate the significant lines!

Once again, thanks for checking in!


Engineering Notebook: Entry 6

This past week, I only went into to work three days rather than five. It was a rough week because progress moved slowly.

I began by finding the sigma value. The sigma value is used to represent the variation in our data values. We will be solving for sigma to find an intensity value at which we will decide which values are significant and which we will determine is noise.

Screen Shot 2018-04-01 at 11.52.39 AM.png

For example in this graph, if our sigma equals .004, then imagine drawing a horizontal line at this value. All the peaks above this line are lines we will consider significant, while all the peaks below the line will be considered noise.

This is the code I used to find the sigma value.

Screen Shot 2018-04-01 at 11.44.24 AM

np.std is a command within the Numpy library that uses standard deviation to solve for the variable named ‘sigma’.

My next assignment was to be able to call values using sub arrays using frequencies rather than index values.

I was particularly struggling with this because I do not have a large amount experience using tuples. Dr. Colgan, my external advisor, wrote up some code to help me, but I am still going through it to try and understand it. Hopefully, I will have a good grasp of it tonight, and can get started on the next part of my project: finding ranges with a lot of noise.

Thanks for checking in!


Engineering Notebook: Entry 5

Entering the NASA Ames Research Center is its own little adventure everyday. After passing the security gate, there are about six stop signs I need to stop at before making the turn onto a dusty road that brings me to building N-245.

Each day when I see a new group of tourists, researchers, scientists, and engineers taking pictures with the huge NASA sign on the wind tunnel, I think about how exciting it is to work in a place where people just beginning their research journeys and those who are close to ending theirs collaborate.

I started this week off by clearing out ALMA data from my laptop. While working with one of my external advisors Dr. Naseem Rangwala, we had looked through all the data I had downloaded and decided which data was okay to delete from my laptop. The raw data was taking up too much space, and I will not be working with the Orion KL data from the ALMA until I am finished working with the previous intern’s data, so having the data on my laptop was unnecessary.

Before my trip to Canada, I was in the process of translating the previous intern’s code to Python. Her process to organize her data is the exact same process I will be taking when working with the Orion KL data. Here is a brief summary of the process:

  1. Smoothing- Smoothing involves averaging over a certain number of points (in our case 7 or 5) to try to reduce the ‘noise’ or the insignificant peaks in a data set.
  2. Noise estimation – This involved identifying regions without any lines and computing estimates of the noise in the data. This is necessary to distinguish real lines from noise.
  3. Evaluating the ranges – This was the process of finding the peaks in the data set that are greater than the noise found in the previous step.
  4. Comparing to catalogues- This step involves comparing data sets to a catalogue of data of already characterized molecules. This way ‘weed’ molecules can be isolated and recognized from the data sets.

In Entry 3 of my Engineering Notebook, I finish off by successfully printed out the first 100 lines of the raw data of the emission from the  Sgr B2N molecular cloud.

After clearing out space on my laptop, I worked on plotting the first 100 points of data. I had worked extensively with my external advisor Dr. Sean Colgan to understand how to plot Numpy arrays in matplotlib. These are both libraries that help me put the data into arrays and then plot them.

With the help of Dr. Colgan and the internet, I was able to write code that would plot the first 100 points of the raw data. Although, we were running into a problem where the plot would appear as a figure, but it would not have any data plotted in it. All the axes labels and numbers that were in the code appeared, but the data failed to be plotted. While I was looking for a way to rewrite my code in a different way to solve the problem, Dr. Colgan was looking through my code for specific errors that may be causing the problem. Eventually, Dr. Colgan had discovered that I had been feeding strings into the plot rather than numbers. The strings needed to be converted to floating point numbers. Screen Shot 2018-03-23 at 5.28.00 PM

After converting the variables to floating point numbers, the code finally worked and produced this plot. Screen Shot 2018-03-23 at 5.20.10 PM

This is the plot of the first 100 values of the raw data.

The next step was to smooth the intensity values. Expecting Numpy to have a smooth command, I searched the internet for instructions on how to smooth in Numpy. As it turns out, Numpy did not have a smoothing command for the specific type of smoothing we wanted to use. So, Dr. Colgan and I looked on the internet for smoothing tools. We ended up finding a smoothing tool in another Python library named “Scipy”.

Here is what the command looks like: Screen Shot 2018-03-23 at 5.32.24 PM

After smoothing the intensity, I put in a new plt.plot() command to plot the smoothed data on top of the raw data.

Screen Shot 2018-03-23 at 5.34.31 PM

Here is what the plot looked like:

Screen Shot 2018-03-23 at 5.35.16 PM.png

The red line is the smoothed data line while the green line is the raw data line.

I finished up this week by making my code more accessible for “general” use. I was trying to make my code general because I wanted to make it easily applicable for any data set it is presented with. The code was already very general and all the specific values were easily changed, but I had to make separate text files with each of the specific number of lines from the raw datato put into the code if I wanted to only plot that specific number of lines.

For example, each time I wanted to plot only the first 3000 values, I had to create an entirely new text file with those first 3000 lines in it. Instead, I wanted to be able to specify a number within the code itself, so that the code can only read the first 3000 lines of the data.

To make the code read a specific number of lines, I had to specify the range right next to the read lines command. This was a refreshingly quick problem to solve compared to the others I had faced this week.

I definitely learned a lot this week, and I am really excited to get started on noise estimation next week!


Engineering Notebook: Entry 4

This week I was in Victoria, Canada for a robotics competition, so please enjoy this brief blog post about what I have done during my senior project and where I am on my original timeline.

So far, I have spent the majority of my time learning Python and downloading software. I have also been learning how to interpret data and how to interpret the previous intern’s code.

Before I left, I began trying to translate the previous intern’s code into Python, and successfully coded the first step of smoothing, which I discussed in my last blog post.

By this week, I was supposed to be well into looking at the old code, but I will be trying to catch up next week!

As always, thank you for checking in!



Engineering Notebook: Entry 3

This week I began looking at the previous intern’s code. She was looking at the Sgr B2N molecular cloud for uncharacterized molecular emission. I began going through her code to understand what she was trying to do and to break down each step of the process she took to reach her end goal.

I will be trying to take her code and translate it to Python to see if I can mimic her work in a different coding language so that we can apply it to the data from the ALMA telescope.

Here is a brief explanation of process she took:

  1. Smoothing- Smoothing involves averaging over a certain number of points (in our case 7) to try to reduce the ‘noise’ or the insignificant peaks in a data set.
  2. Noise estimation – This involved identifying regions without any lines and computing estimates of the noise in the data. This is necessary to distinguish real lines from noise.
  3. Evaluating the ranges – This was the process of finding the peaks in the data set that are greater than the noise found in the previous step.
  4. Comparing to catalogues- This step involves comparing data sets to a catalogue of data of already characterized molecules. This way ‘weed’ molecules can be isolated and recognized from the data sets.

After going through the Matlab code and understanding the processes that were taken I began coding in Python.

The first goal I sought out to accomplish was to make an array of the data and print out the first 100 lines of the 1.4 million line raw spectrum data that the previous intern was using.

The raw spectrum data looks like this: Screen Shot 2018-03-09 at 5.46.32 PM.png

The first column has values for frequency and the second column has values for intensity.

At first, I struggled to open and read the file because my code could not locate it. Ultimately, entering the specific path of the file and converting the file to a text file solved this problem.

After locating the files, I had made an array using NumPy, which is similar to a list in Python, of the numbers in the raw data. Within this array, I used two for loops with a .split(/t) method to split the columns, and then I used another for loop to go through and read each line. Then I printed out the variable I had set my array equal to, making sure to print out the first 100 lines of the data.

This is where I stopped my week!

As always, thanks for checking in!

Engineering Notebook: Entry 2

These past two weeks I have been working through Python courses to familiarize myself with the syntax and work through the different challenges of coding. I have also downloaded various ALMA data so that I could habituate myself with the a software I will be using to interpret it.

Originally, I was going through tutorials on and Codeacademy, but I felt as though the tutorials did not have enough explanations. I found that I was becoming confused with the examples pretty easily, so I started taking a Python course on Udemy. This course was much more in depth and better suited towards a new coder like me. I will be finished with my Python course by Monday, so I will be ready to start working with the previous intern’s code!

CASA is the Common Astronomy Software Applications package. It is a tool to help go through and view the data from the ALMA telescope. This week I was taught how to use this tool to pull Molecular Line Surveys from the FITS files. The data comes in three dimensions, and the software helps me locate the areas with a lot of “noise”. Then I can pick different frames at different frequencies to then pull up a molecular line spectrum.

I look forward to diving into Orion KL region in the upcoming weeks!

NASA at NightThis is a picture of the runway outside of the building I work in.

Thanks again for checking in!


Engineering Notebook: Entry 1

This week I spent a lot of time getting situated into my internship, so I primarily worked on downloading software, testing software, and finishing my training. This blog post will have brief summaries of what I did each day this past week.


I started my week off by finally meeting my external advisors Dr. Sean Colgan and Dr. Naseem Rangwala in person at the NASA Ames Research Center! I was given a crash course on where I would be working, what resources I had access to, and what training I had to complete. Also, I had downloaded the previous intern’s code so that in the future, I could refer to her algorithms and build on her work.


I began going through safety training and spent a lot of time installing various software. I installed a newer version of Python on my laptop and began testing some very simple code.


I successfully installed an environment on my laptop so that I could save my code projects in a simple way. I also started working through Python tutorials to advance my coding skills.


I continued to go through Python tutorials and practice problems. I also started downloading the ALMA data that I will be using for my project. I ran into some issues with website and could not download the data successfully.



Today, I tried to fix some issues with software installation. The ALMA data finally began downloading on my laptop, so I will have the data ready for next week. I also tried to download a software named CASA which would help me interpret the data I downloaded. I did run into a few problems with installing this software, so I will continue to battle with it next week!


Thank you for checking in!