Tags: data

The Case for Data-Driven Open Source Development

The lack of standardized metrics, datasets, methodologies and tools for extracting insights from Open Source projects is real. Open Source Metrics That Actually Matter Let’s take a look at the first part of the problem: the metrics. OSS Project stakeholders simply don’t have the data to make...
Read 0 Comments

The Growing Significance Of DevOps For Data Science

DevOps involves infrastructure provisioning, configuration management, continuous integration and deployment, testing and monitoring.  DevOps teams have been closely working with the development teams to manage the lifecycle of applications effectively. Data science brings additional...
Read 0 Comments

Introducing ODPi Egeria – The Industry’s First Open Metadata Standard

Organizations looking to better locate, understand, manage and gain value from their data have a new industry standard to leverage. ODPi, a nonprofit Linux Foundation organization focused upon accelerating the open ecosystem of big data solutions, recently announced ODPi Egeria, a new project that...
Read 0 Comments

Why the Future of Data Storage is (Still) Magnetic Tape

Studies show [PDF] that the amount of data being recorded is increasing at 30 to 40 percent per year. At the same time, the capacity of modern hard drives, which are used to store most of this, is increasing at less than half that rate. Fortunately, much of this information doesn’t need to be...
Read 0 Comments

A Deep Dive Into Data Lakes

In the age of Big Data, we’ve had to come up with new terms to describe large-scale data storage. We have databases, data warehouses and now data lakes. While they all contain data, these terms describe different ways of storing and using that data. Before we discuss data lakes and why they are...
Read 0 Comments

Tech Heavyweights Create Open Source Project to Transfer Data

Google, Facebook, Twitter, and Microsoft launched a new open source project aimed at making it easier for users to transfer data between services without having to download it and upload it to another service. The use cases for this type of open source software are wide ranging. For example, an end...
Read 0 Comments

Doing Good Data Science

There has been a lot of healthy discussion about data ethics lately. We want to be clear: that discussion is good, and necessary. But it’s also not the biggest problem we face. We already have good standards for data ethics. The ACM’s code of ethics, which dates back to 1993, is clear, concise, and...
Read 0 Comments

Comprehensive Beginner’s Guide to Jupyter Notebooks for Data Science & Machine Learning

Jupyter Notebooks allow data scientists to create and share their documents, from codes to full blown reports. They help data scientists streamline their work and enable more productivity and easy collaboration. Due to these and several other reasons you will see below, Jupyter Notebooks are one of...
Read 0 Comments

Removing the Storage Bottleneck for AI

If the history of high performance computing has taught us anything, it is that we cannot focus too much on compute at the expense of storage and networking. Having all of the compute in the world doesn’t mean diddlysquat if the storage can’t get data to the compute elements – whatever they might...
Read 0 Comments

Here’s Why You Should Secure Your Etcd Deployment

Etcd, a key-value store and a core component of Kubernetes clusters, is used to store highly sensitive configuration data but is also easily left unprotected, as a developer recently found. Puerto Rican software developer Giovanni Collazo was looking into etcd, first developed by CoreOS, and...
Read 0 Comments

Pages

Click Here!