![]() However, preserving history and not having to pull it out of the Wayback Machine seems like something worthwhile. Of course, re-reading some of the older posts, I begin to wonder how valuable it all is too. It was 1994 and I figured who cares about the content I wrote back then. This latter path is what I took when I left my shell server and lost the files. The most common solution is to simply abandon the content and leave it as a point in time until it's taken down, like Geocities, MySpace, etc. plan file updates (accessible by finger) and converted to gopher and then as html updates for NCSA Mosaic users. In fact, it existed for 4 years before that, initially as. The whole idea here is that I want to migrate my content from dead networks into something I control and can keep alive. Plan B is to just scrape the existing posts while they're still available and see how much I can convert to Markdown via that method. I have two more coders cranking away at it. Converting all of this into Markdown to import into this blog is where everything came to a standstill. I have some scripts which take care of matching the JSON file with the metadata and find the images associated with the posts. Three have quit saying it's probably faster to hire people on Mechanical Turk to manually re-create the posts than it is to programmatically do the same. I then hired a few coders who wanted the challenge. After spending an afternoon hacking away, I figured I had real work to do and put it aside. The assumption that while it's messy, it's at least produced by some Google tool which exports data the same way every time. I spent some time hacking up solutions to handle the content field. It appears over the 3000+ JSON files, some are more complex and the whole set is complex. The "content" field is not, because it includes snippets of html, escaped text, and in other posts, seemingly encoded text (base64, unicode, hex, and who knows what else). Most of that is pretty easy to parse with nearly any JSON tool. "title": "INSIDE DARK WEB PREMIERES IN NEW YORK CITY - Dark Web conference 2016", "content": "Speaking at " Inside the Dark Web" this week. Here's a simple example of the JSON about a post: So I started down the path of parsing the JSON files to recreate the posts. It turns out, the folders of "Photos" and "Posts" are just directory listings. My entire goal is to programmatically parse the information, so I went back and got the JSON export. However, it may be more difficult to programmatically parse the information." "By default your stream activity is delivered in the HTML format and is viewable in any web browser. Ok, so click on "Google+ Stream", which brings up this page: Here it is below:īeing curious, I clicked on "Learn more", it takes you to this support answer, . This displays a nice start page for your posts, or so I hoped. In some world, this makes sense, so let's just keep going.Īt the root of the extracted folders is an index.html file. The three text files appear to be a JSON dump of the post and some metadata, a metadata comma delimited file of metadata about the post, and then a metadata comma delimited file about each image included in a post. Unzipping the JSON file results in thousands of files, specifically at least three text files per post, and then there's the images tied to each post. I figured JSON is easier to parse and therefore quicker to re-create the Google+ posts into this blog. Move the little slider on the right to the right and choose your format, either HTML or JSON. Basically, go to Google Takeout, choose Select None, and scroll down the page and find Google+ Stream. GOOGLE PHOTOS TAKEOUT EXIF DATA MISSING HOW TOIt's trivially easy to find instructions on the Internet about how to select and download your data through Google Takeout. With the imminent demise of Google+, I figured now is the time to export my data and import original posts to this blog. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |